Lorem Ipsum
Framework for abstract:
Introduction to airline industry
What the data is about
customer satisfaction scores from 120,000+ passengers
Nothing about the carrier
Nothing about the price
What the research seeks to find
Predict whether future customer would be satisfied given the details of service
Determine the aspects of service that are important towards customer satisfaction
Methods of analysis
What the data tells us (findings)
Draft of abstract:
After industry-wide setbacks from the COVID-19 pandemic, the airline industry is experiencing a strong recovery in profitability. According to the International Air Transport Association (IATA), airlines worldwide have seen improved projections since 2024. The market is projected to showcase an annual growth rate (CAGR 2025-2029) of 4.36%, resulting in a projected market volume of US$771.26bn by 2029, worldwide (Statista, 2024). However, low-cost carriers (LCCs) are shifting the traditional perceptions of value and service among travelers by offering lower fares with fewer amenities. As traditional full-service airlines compete in this market environment, understanding the main drivers of customer satisfaction becomes critical to maintain competitiveness.
In this research, we analyze data from over 120,000+ airline passengers, including information on passenger demographics, flight characteristics (delays and distance), as well as service evaluations. Since this data lacks information on the specific airlines and ticket pricing, we will focus on analyzing the service-related factors that universally impact passenger satisfaction. This research aims to:
In our analysis, we initially explore the dataset to understand the variables and their distributions, as well as detect missing values. Data preparation procedures such as cleaning, normalization, and transformation are conducted before analysis. We utilize Principal Component Analysis (PCA) to reduce dimensionality as well as to determine key variables. Clustering is then implemented to gain deeper insights on passenger groups and patterns of service satisfaction.<- placeholder aja, bisa diubah
In summary…. (hasil temuan)
In this project, Group 17 serves as business consultants, with a focus on improving customer experience in the industry. Our responsibilities are from analyzing customer satisfaction statistics, developing measures to improve the airline’s overall performance. Therefore, the business problem that we solve is “how can airlines improve its strategy of services and activities to increase customer satisfaction, especially between segments and main service areas, to achieve stronger competitive advantages?”.
The reason why we chose airline industry as our focus :
Low-cost carriers (LCCs) are shifting the traditional perceptions of value and service among travelers by offering lower fares with fewer amenities.
Intense rivalry among airlines could be easily justified by customer satisfaction. Thus, higher customer satisfaction leads to increased client retention and overall business performance.
Concentrating on major market segments may provide a more general view of the aspects contributing to consumer satisfaction.
Why focus on customer satisfaction?
Enhance customer loyalty: improving on these key segments and service areas will enhance customer satisfaction that potentially increases customer loyalty rate.
Competitive Advantage: by addressing the service issue, the airline can stand out from the competitors.
Lorem IpsumBesides the Basic R Functions, this project uses multiple packages to enhance the R functionalities and improve the data analytics and visualization quality. This section explains all the packages and functions beyond the Basic R functionalities that we are using in this project in each step of the business analytics process from data cleaning to modeling.
dplyr : used to filter rows, select columns, sort data, create new columns, and summarize data.
tidyr : imported to be used to clean and tidy the data, such as reshaping the data from wide to long format.
skimr : designed to provide summary of statistical measures for each variables in data frames.
DataExplorer : used to display comprehensive and convenient interface for data exploration and data visualization.
Data Transformation
Data Visualization
ggplot2 : a package that is useful to visualize data in a more complex and customizable way compared to the basic R plotting functions.
corrplot : used to visualize correlation matrix between two variables in a more visually appealing and informative way.
Data Modeling
Data Analysis
Below we will find the commands to install the required libraries so that the code can run properly.The libraries only need to be imported only once, later we only need to call the necessary libraries when needed on the specific sections.
# Importing Library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(skimr)
## Warning: package 'skimr' was built under R version 4.4.3
library(DataExplorer)
## Warning: package 'DataExplorer' was built under R version 4.4.3
library(corrplot)
## corrplot 0.95 loaded
library(tidyr)
In this assignment, we use data set of Airline Passenger Satisfaction Dataset through Kaggle.The data consist of 24 variables from 129,880 passengers. Data cleaning and preparation are required to ensure the data is ready for analysis.
Firstly import all the data, csv file to be used in the data analysis process. The before using the data, it is important to undertand all data type included in the data set.Subsequently, understanding statistic perofile in the data set could help to prepare the data before using it.
# Read Data
# please note that the csv needs to be put into the same folders as this file for ensuring everything works properly
# the dataset should be downloaded from the Kaggle, or unzip from our submission in BrightSpace.
data <- read.csv("airline_passenger_satisfaction.csv")
#Understanding the data types
str(data)
## 'data.frame': 129880 obs. of 24 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Gender : chr "Male" "Female" "Male" "Male" ...
## $ Age : int 48 35 41 50 49 43 43 60 50 38 ...
## $ Customer.Type : chr "First-time" "Returning" "Returning" "Returning" ...
## $ Type.of.Travel : chr "Business" "Business" "Business" "Business" ...
## $ Class : chr "Business" "Business" "Business" "Business" ...
## $ Flight.Distance : int 821 821 853 1905 3470 3788 1963 853 2607 2822 ...
## $ Departure.Delay : int 2 26 0 0 0 0 0 0 0 13 ...
## $ Arrival.Delay : int 5 39 0 0 1 0 0 3 0 0 ...
## $ Departure.and.Arrival.Time.Convenience: int 3 2 4 2 3 4 3 3 1 2 ...
## $ Ease.of.Online.Booking : int 3 2 4 2 3 4 3 4 1 5 ...
## $ Check.in.Service : int 4 3 4 3 3 3 4 3 3 3 ...
## $ Online.Boarding : int 3 5 5 4 5 5 4 4 2 5 ...
## $ Gate.Location : int 3 2 4 2 3 4 3 4 1 2 ...
## $ On.board.Service : int 3 5 3 5 3 4 5 3 4 5 ...
## $ Seat.Comfort : int 5 4 5 5 4 4 5 4 3 4 ...
## $ Leg.Room.Service : int 2 5 3 5 4 4 5 4 4 5 ...
## $ Cleanliness : int 5 5 5 4 5 3 4 4 3 4 ...
## $ Food.and.Drink : int 5 3 5 4 4 3 5 4 3 2 ...
## $ In.flight.Service : int 5 5 3 5 3 4 5 3 4 5 ...
## $ In.flight.Wifi.Service : int 3 2 4 2 3 4 3 4 4 2 ...
## $ In.flight.Entertainment : int 5 5 3 5 3 4 5 3 4 5 ...
## $ Baggage.Handling : int 5 5 3 5 3 4 5 3 4 5 ...
## $ Satisfaction : chr "Neutral or Dissatisfied" "Satisfied" "Satisfied" "Satisfied" ...
# Summary of Data statistics
summary(data)
## ID Gender Age Customer.Type
## Min. : 1 Length:129880 Min. : 7.00 Length:129880
## 1st Qu.: 32471 Class :character 1st Qu.:27.00 Class :character
## Median : 64941 Mode :character Median :40.00 Mode :character
## Mean : 64941 Mean :39.43
## 3rd Qu.: 97410 3rd Qu.:51.00
## Max. :129880 Max. :85.00
##
## Type.of.Travel Class Flight.Distance Departure.Delay
## Length:129880 Length:129880 Min. : 31 Min. : 0.00
## Class :character Class :character 1st Qu.: 414 1st Qu.: 0.00
## Mode :character Mode :character Median : 844 Median : 0.00
## Mean :1190 Mean : 14.71
## 3rd Qu.:1744 3rd Qu.: 12.00
## Max. :4983 Max. :1592.00
##
## Arrival.Delay Departure.and.Arrival.Time.Convenience
## Min. : 0.00 Min. :0.000
## 1st Qu.: 0.00 1st Qu.:2.000
## Median : 0.00 Median :3.000
## Mean : 15.09 Mean :3.058
## 3rd Qu.: 13.00 3rd Qu.:4.000
## Max. :1584.00 Max. :5.000
## NA's :393
## Ease.of.Online.Booking Check.in.Service Online.Boarding Gate.Location
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :2.757 Mean :3.306 Mean :3.253 Mean :2.977
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
##
## On.board.Service Seat.Comfort Leg.Room.Service Cleanliness
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :4.000 Median :4.000 Median :4.000 Median :3.000
## Mean :3.383 Mean :3.441 Mean :3.351 Mean :3.286
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
##
## Food.and.Drink In.flight.Service In.flight.Wifi.Service
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.000
## Median :3.000 Median :4.000 Median :3.000
## Mean :3.205 Mean :3.642 Mean :2.729
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000
##
## In.flight.Entertainment Baggage.Handling Satisfaction
## Min. :0.000 Min. :1.000 Length:129880
## 1st Qu.:2.000 1st Qu.:3.000 Class :character
## Median :4.000 Median :4.000 Mode :character
## Mean :3.358 Mean :3.632
## 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
##
Based on Kaggle, here is the definition of each column in the dataset. This is useful for later stage to understand the data characteristics and how to use it.
| Field Name | Description |
|---|---|
| ID | Unique passenger identifier |
| Gender | Gender of the passenger (Female/Male) |
| Age | Age of the passenger |
| Customer Type | Type of airline customer (First-time/Returning) |
| Type of Travel | Purpose of the flight (Business/Personal) |
| Class | Travel class in the airplane for the passenger seat |
| Flight Distance | Flight distance in miles |
| Departure Delay | Flight departure delay in minutes |
| Arrival Delay | Flight arrival delay in minutes |
| Departure and Arrival Time Convenience | Satisfaction level with the convenience of the flight departure and arrival times from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Ease of Online Booking | Satisfaction level with the online booking experience from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Check-in Service | Satisfaction level with the check-in service from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Online Boarding | Satisfaction level with the online boarding experience from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Gate Location | Satisfaction level with the gate location in the airport from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| On-board Service | Satisfaction level with the on-boarding service in the airport from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Seat Comfort | Satisfaction level with the comfort of the airplane seat from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Leg Room Service | Satisfaction level with the leg room of the airplane seat from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Cleanliness | Satisfaction level with the cleanliness of the airplane from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Food and Drink | Satisfaction level with the food and drinks on the airplane from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| In-flight Service | Satisfaction level with the in-flight service from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| In-flight Wifi Service | Satisfaction level with the in-flight Wifi service from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| In-flight Entertainment | Satisfaction level with the in-flight entertainment from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Baggage Handling | Satisfaction level with the baggage handling from the airline from 1 (lowest) to 5 (highest) - 0 means “not applicable” |
| Satisfaction | Overall satisfaction level with the airline (Satisfied/Neutral or unsatisfied) |
The business analysis focuses on what key parameters from passengers’ perspective that define their satisfactory level towards airline services. The analysis will be conducted by perfroming predictive modelling (regression analysis) as well as classification. Run code belows to see the workflow of the analysis.
file_path <- "Workflow_Group17.png"
knitr::include_graphics(file_path)
In this research, we aim to predict whether future customer would be satisfied given the details of service by determining the aspects of service that are important towards customer satisfaction. From the dataset, we hypothesize that service characteristics such as arrival and departure delays, on-board service, and comfort play a significant role towards customer satisfaction.
Given that the dataset does not provide information on the specific airline or ticket pricing, we will focus to analyzing the service-related factors that universally impact passenger satisfaction. Understanding and preparing the the dataset will be crucial, as it may contain missing values or quality issues. Hence, we will dedicate the next section on data exploration and preparation. In doing so, we will first visualize the variables to get an understanding of the distributions.
In this chapter, we will use visual and descriptive data exploration methods to gain a deeper understanding of the distributions and the characteristics of variables in our dataset. We will begin by cleaning the data from any null variables, as well has handle outliers in the data. Later in this chapter, we will continue to explore each variable and interpret features. Finally, we will conduct initial plotting of the data to show how the features are correlated to each other.
We will start the exploration phase by checking for missing data
using the is.na() function.
any(is.na(data))
## [1] TRUE
colMeans(is.na(data))
## ID Gender
## 0.00000000 0.00000000
## Age Customer.Type
## 0.00000000 0.00000000
## Type.of.Travel Class
## 0.00000000 0.00000000
## Flight.Distance Departure.Delay
## 0.00000000 0.00000000
## Arrival.Delay Departure.and.Arrival.Time.Convenience
## 0.00302587 0.00000000
## Ease.of.Online.Booking Check.in.Service
## 0.00000000 0.00000000
## Online.Boarding Gate.Location
## 0.00000000 0.00000000
## On.board.Service Seat.Comfort
## 0.00000000 0.00000000
## Leg.Room.Service Cleanliness
## 0.00000000 0.00000000
## Food.and.Drink In.flight.Service
## 0.00000000 0.00000000
## In.flight.Wifi.Service In.flight.Entertainment
## 0.00000000 0.00000000
## Baggage.Handling Satisfaction
## 0.00000000 0.00000000
We find that null data only exists for 0.3% and only in the Arrival Delay column, therefore this would not significantly affecting the whole dataset. We’ll fill the Arrival Delay null with the mean and median of Arrival Delay which is 0.
data$Departure.Delay = as.numeric(data$Departure.Delay)
data$Arrival.Delay[is.na(data$Arrival.Delay)] <-0
any(is.na(data))
## [1] FALSE
We succesfully replace the null values in the dataset. Let’s check
for numerical outliers using the summary() function, we
will look for major differences between the median and the mean.
# Extract summary statistics for numerical columns
numerical_summary <- data.frame(t(sapply(data[sapply(data, is.numeric)], summary)))
# Add a column to calculate the difference between mean and median
numerical_summary$Mean_Median_Difference <- abs(numerical_summary$Mean - numerical_summary$Median)
# Tabulate the output
knitr::kable(numerical_summary, caption = "Summary of Numerical Columns with Mean-Median Differences")
| Min. | X1st.Qu. | Median | Mean | X3rd.Qu. | Max. | Mean_Median_Difference | |
|---|---|---|---|---|---|---|---|
| ID | 1 | 32470.75 | 64940.5 | 64940.500000 | 97410.25 | 129880 | 0.0000000 |
| Age | 7 | 27.00 | 40.0 | 39.427957 | 51.00 | 85 | 0.5720434 |
| Flight.Distance | 31 | 414.00 | 844.0 | 1190.316392 | 1744.00 | 4983 | 346.3163921 |
| Departure.Delay | 0 | 0.00 | 0.0 | 14.713713 | 12.00 | 1592 | 14.7137127 |
| Arrival.Delay | 0 | 0.00 | 0.0 | 15.045465 | 13.00 | 1584 | 15.0454650 |
| Departure.and.Arrival.Time.Convenience | 0 | 2.00 | 3.0 | 3.057599 | 4.00 | 5 | 0.0575993 |
| Ease.of.Online.Booking | 0 | 2.00 | 3.0 | 2.756876 | 4.00 | 5 | 0.2431244 |
| Check.in.Service | 0 | 3.00 | 3.0 | 3.306267 | 4.00 | 5 | 0.3062673 |
| Online.Boarding | 0 | 2.00 | 3.0 | 3.252633 | 4.00 | 5 | 0.2526332 |
| Gate.Location | 0 | 2.00 | 3.0 | 2.976925 | 4.00 | 5 | 0.0230751 |
| On.board.Service | 0 | 2.00 | 4.0 | 3.383023 | 4.00 | 5 | 0.6169772 |
| Seat.Comfort | 0 | 2.00 | 4.0 | 3.441361 | 5.00 | 5 | 0.5586387 |
| Leg.Room.Service | 0 | 2.00 | 4.0 | 3.350878 | 4.00 | 5 | 0.6491223 |
| Cleanliness | 0 | 2.00 | 3.0 | 3.286326 | 4.00 | 5 | 0.2863258 |
| Food.and.Drink | 0 | 2.00 | 3.0 | 3.204774 | 4.00 | 5 | 0.2047736 |
| In.flight.Service | 0 | 3.00 | 4.0 | 3.642193 | 5.00 | 5 | 0.3578072 |
| In.flight.Wifi.Service | 0 | 2.00 | 3.0 | 2.728696 | 4.00 | 5 | 0.2713043 |
| In.flight.Entertainment | 0 | 2.00 | 4.0 | 3.358077 | 4.00 | 5 | 0.6419233 |
| Baggage.Handling | 1 | 3.00 | 4.0 | 3.632114 | 5.00 | 5 | 0.3678857 |
remove(numerical_summary)
We notice that there are large outliers in the Flight Distance, as well as in Arrival and Departure Delay. Let’s see in the boxplot format :
Flight Distance
ggplot(data, aes(x=Flight.Distance)) +
geom_boxplot(fill = 2,alpha = 0.5,color = 1,outlier.colour = 2) +
theme_bw()
Departure Delay
ggplot(data, aes(x=Departure.Delay)) +
geom_boxplot(fill = 2,alpha = 0.5,color = 1,outlier.colour = 2) +
theme_bw()
Arrival Delay
ggplot(data, aes(x=Arrival.Delay)) +
geom_boxplot(fill = 2,alpha = 0.5,color = 1,outlier.colour = 2) +
theme_bw()
From the boxplots above, we can see that Flight Distance is slightly skewed while Arrival Delay and Departure Delay is heavily skewed. We will use median imputation, which is appropriate when the distribution of the data is skewed.
# Imputation of flight.distance variable
data$Flight.Distance[which(
data$Flight.Distance > (quantile(data$Flight.Distance, 0.75, na.rm = TRUE) + 1.5 * IQR(data$Flight.Distance, na.rm = TRUE))
)] <- median(data$Flight.Distance, na.rm = TRUE)
# Imputation of arrival.delay variable
data$Arrival.Delay[which(
data$Arrival.Delay > (quantile(data$Arrival.Delay, 0.75, na.rm = TRUE) + 1.5 * IQR(data$Arrival.Delay, na.rm = TRUE))
)] <- median(data$Arrival.Delay, na.rm = TRUE)
# Imputation of departure.delay variable
data$Departure.Delay[which(
data$Departure.Delay > (quantile(data$Departure.Delay, 0.75, na.rm = TRUE) + 1.5 * IQR(data$Departure.Delay, na.rm = TRUE))
)] <- median(data$Departure.Delay, na.rm = TRUE)
Now let’s check the distribution of the data after the imputation.
plot_histogram(data[ , c('Flight.Distance','Arrival.Delay','Departure.Delay')], ncol=2,ggtheme = theme_minimal())
Below, we will also take a look again at the boxplots for Departure Delay and Arrival Delay. Note that before the imputation, the plots were skewed to the point that the boxes were almost no longer visible.
ggplot(data, aes(x=Departure.Delay)) +
geom_boxplot(fill = 2,alpha = 0.5,color = 1,outlier.colour = 2) +
theme_bw()
ggplot(data, aes(x=Arrival.Delay)) +
geom_boxplot(fill = 2,alpha = 0.5,color = 1,outlier.colour = 2) +
theme_bw()
We can see that we have significantly normalized the skewness of the data. Thus, we have successfully eliminated all missing values and normalized all outliers in the data. Next, we will continue further into the data exploration phase.
In this subchapter, we will begin by conducting descriptive univariate data analysis which is followed by association between numerical and categorical variables.
Now we visualize the distribution in each categorical column
# Bar chart will be used for categorical data
plot_bar(data, ncol=2, order_bar=TRUE, ggtheme = theme_minimal())
We can see that the gender distribution is equal. Furthermore, the majority type of travel is business which is also reflected in the customer type, where the majority is returning customers rather than first-time. Hence, increasing satisfaction of loyal customers will bring significant impact to the overall of airline customer satisfaction rate. Next, we will visualize the distribution in each numerical column. The ID column is left out as it is just a list of numbers.
# Numerical variables are visualized using histogram
plot_histogram(data[ , !names(data) %in% "ID"], ncol=2, ggtheme = theme_minimal())
From the visualizations above, it is not immediately clear which variables may be strongly influencing each other. Hence, we will continue the exploration by analyzing the associations between numerical variables.
numeric_data <- data[sapply(data, is.numeric)]
numeric_data <- na.omit(numeric_data)
# Compute correlation matrix
cor_matrix <- cor(numeric_data)
# Print the correlation matrix
print(cor_matrix)
## ID Age
## ID 1.0000000000 0.0203221824
## Age 0.0203221824 1.0000000000
## Flight.Distance 0.1022584170 0.0893852005
## Departure.Delay 0.0660485697 -0.0054671682
## Arrival.Delay 0.0224045458 -0.0074931552
## Departure.and.Arrival.Time.Convenience -0.0021920038 0.0369602634
## Ease.of.Online.Booking 0.0134000186 0.0225652378
## Check.in.Service 0.0793252087 0.0334753085
## Online.Boarding 0.0555378869 0.2075724231
## Gate.Location -0.0001130817 -0.0003980758
## On.board.Service 0.0555016852 0.0570776207
## Seat.Comfort 0.0521641921 0.1591359282
## Leg.Room.Service 0.0440883552 0.0391190045
## Cleanliness 0.0240475050 0.0525651010
## Food.and.Drink -0.0005103864 0.0231937123
## In.flight.Service 0.0787930170 -0.0513466111
## In.flight.Wifi.Service -0.0230963975 0.0161162160
## In.flight.Entertainment 0.0016204382 0.0749465159
## Baggage.Handling 0.0745692684 -0.0479910406
## Flight.Distance Departure.Delay
## ID 0.102258417 0.0660485697
## Age 0.089385200 -0.0054671682
## Flight.Distance 1.000000000 0.0144629202
## Departure.Delay 0.014462920 1.0000000000
## Arrival.Delay -0.001644757 0.4520762051
## Departure.and.Arrival.Time.Convenience -0.013972714 -0.0014122443
## Ease.of.Online.Booking 0.063665582 -0.0025286991
## Check.in.Service 0.070694837 -0.0023126209
## Online.Boarding 0.201323559 -0.0113489110
## Gate.Location 0.005805883 0.0016946084
## On.board.Service 0.104653887 -0.0071698780
## Seat.Comfort 0.147639915 -0.0025332906
## Leg.Room.Service 0.126539738 -0.0085318492
## Cleanliness 0.090210968 -0.0042499056
## Food.and.Drink 0.054242099 -0.0018817431
## In.flight.Service 0.056544236 0.0005323097
## In.flight.Wifi.Service 0.007242013 -0.0142761646
## In.flight.Entertainment 0.120349680 -0.0102650206
## Baggage.Handling 0.061386622 -0.0072871564
## Arrival.Delay
## ID 0.0224045458
## Age -0.0074931552
## Flight.Distance -0.0016447568
## Departure.Delay 0.4520762051
## Arrival.Delay 1.0000000000
## Departure.and.Arrival.Time.Convenience -0.0008254658
## Ease.of.Online.Booking -0.0053839321
## Check.in.Service -0.0206091252
## Online.Boarding -0.0310594292
## Gate.Location 0.0031652105
## On.board.Service -0.0281979818
## Seat.Comfort -0.0176498907
## Leg.Room.Service -0.0220063868
## Cleanliness -0.0200067898
## Food.and.Drink -0.0146718531
## In.flight.Service -0.0194704498
## In.flight.Wifi.Service -0.0215035673
## In.flight.Entertainment -0.0264131850
## Baggage.Handling -0.0243963222
## Departure.and.Arrival.Time.Convenience
## ID -0.0021920038
## Age 0.0369602634
## Flight.Distance -0.0139727139
## Departure.Delay -0.0014122443
## Arrival.Delay -0.0008254658
## Departure.and.Arrival.Time.Convenience 1.0000000000
## Ease.of.Online.Booking 0.4376196545
## Check.in.Service 0.0911317589
## Online.Boarding 0.0722868917
## Gate.Location 0.4475099458
## On.board.Service 0.0672969787
## Seat.Comfort 0.0086664448
## Leg.Room.Service 0.0106171078
## Cleanliness 0.0098620846
## Food.and.Drink 0.0006866832
## In.flight.Service 0.0721948030
## In.flight.Wifi.Service 0.3449151814
## In.flight.Entertainment -0.0083800141
## Baggage.Handling 0.0708330359
## Ease.of.Online.Booking Check.in.Service
## ID 0.013400019 0.079325209
## Age 0.022565238 0.033475309
## Flight.Distance 0.063665582 0.070694837
## Departure.Delay -0.002528699 -0.002312621
## Arrival.Delay -0.005383932 -0.020609125
## Departure.and.Arrival.Time.Convenience 0.437619655 0.091131759
## Ease.of.Online.Booking 1.000000000 0.008819308
## Check.in.Service 0.008819308 1.000000000
## Online.Boarding 0.404865758 0.204238015
## Gate.Location 0.460040547 -0.039353016
## On.board.Service 0.039064190 0.244618669
## Seat.Comfort 0.028560733 0.189979117
## Leg.Room.Service 0.109449655 0.152693216
## Cleanliness 0.015124786 0.176658031
## Food.and.Drink 0.030513982 0.085197877
## In.flight.Service 0.035372567 0.237601243
## In.flight.Wifi.Service 0.714806849 0.043762366
## In.flight.Entertainment 0.046563505 0.119554033
## Baggage.Handling 0.039148282 0.234503128
## Online.Boarding Gate.Location
## ID 0.055537887 -0.0001130817
## Age 0.207572423 -0.0003980758
## Flight.Distance 0.201323559 0.0058058834
## Departure.Delay -0.011348911 0.0016946084
## Arrival.Delay -0.031059429 0.0031652105
## Departure.and.Arrival.Time.Convenience 0.072286892 0.4475099458
## Ease.of.Online.Booking 0.404865758 0.4600405473
## Check.in.Service 0.204238015 -0.0393530157
## Online.Boarding 1.000000000 0.0027559844
## Gate.Location 0.002755984 1.0000000000
## On.board.Service 0.154242237 -0.0290187693
## Seat.Comfort 0.419252757 0.0027879444
## Leg.Room.Service 0.123225470 -0.0051811002
## Cleanliness 0.329377393 -0.0059176556
## Food.and.Drink 0.233500190 -0.0028721959
## In.flight.Service 0.074058441 0.0003104152
## In.flight.Wifi.Service 0.457445219 0.3385732308
## In.flight.Entertainment 0.283921540 0.0027408296
## Baggage.Handling 0.083541489 0.0009719310
## On.board.Service Seat.Comfort
## ID 0.055501685 0.052164192
## Age 0.057077621 0.159135928
## Flight.Distance 0.104653887 0.147639915
## Departure.Delay -0.007169878 -0.002533291
## Arrival.Delay -0.028197982 -0.017649891
## Departure.and.Arrival.Time.Convenience 0.067296979 0.008666445
## Ease.of.Online.Booking 0.039064190 0.028560733
## Check.in.Service 0.244618669 0.189979117
## Online.Boarding 0.154242237 0.419252757
## Gate.Location -0.029018769 0.002787944
## On.board.Service 1.000000000 0.130544875
## Seat.Comfort 0.130544875 1.000000000
## Leg.Room.Service 0.357721317 0.104272400
## Cleanliness 0.122083757 0.679613003
## Food.and.Drink 0.057404010 0.575846177
## In.flight.Service 0.551568828 0.068842149
## In.flight.Wifi.Service 0.119927683 0.121513245
## In.flight.Entertainment 0.418573575 0.611836657
## Baggage.Handling 0.520295528 0.074619552
## Leg.Room.Service Cleanliness
## ID 0.044088355 0.024047505
## Age 0.039119004 0.052565101
## Flight.Distance 0.126539738 0.090210968
## Departure.Delay -0.008531849 -0.004249906
## Arrival.Delay -0.022006387 -0.020006790
## Departure.and.Arrival.Time.Convenience 0.010617108 0.009862085
## Ease.of.Online.Booking 0.109449655 0.015124786
## Check.in.Service 0.152693216 0.176658031
## Online.Boarding 0.123225470 0.329377393
## Gate.Location -0.005181100 -0.005917656
## On.board.Service 0.357721317 0.122083757
## Seat.Comfort 0.104272400 0.679613003
## Leg.Room.Service 1.000000000 0.096694724
## Cleanliness 0.096694724 1.000000000
## Food.and.Drink 0.033172794 0.658053930
## In.flight.Service 0.369569478 0.090355980
## In.flight.Wifi.Service 0.160316959 0.131299526
## In.flight.Entertainment 0.300397442 0.692510538
## Baggage.Handling 0.371454684 0.097071490
## Food.and.Drink In.flight.Service
## ID -0.0005103864 0.0787930170
## Age 0.0231937123 -0.0513466111
## Flight.Distance 0.0542420987 0.0565442356
## Departure.Delay -0.0018817431 0.0005323097
## Arrival.Delay -0.0146718531 -0.0194704498
## Departure.and.Arrival.Time.Convenience 0.0006866832 0.0721948030
## Ease.of.Online.Booking 0.0305139818 0.0353725669
## Check.in.Service 0.0851978765 0.2376012426
## Online.Boarding 0.2335001900 0.0740584413
## Gate.Location -0.0028721959 0.0003104152
## On.board.Service 0.0574040099 0.5515688281
## Seat.Comfort 0.5758461771 0.0688421492
## Leg.Room.Service 0.0331727940 0.3695694779
## Cleanliness 0.6580539298 0.0903559796
## Food.and.Drink 1.0000000000 0.0352096628
## In.flight.Service 0.0352096628 1.0000000000
## In.flight.Wifi.Service 0.1322138724 0.1100285539
## In.flight.Entertainment 0.6234609372 0.4060936084
## Baggage.Handling 0.0353207442 0.6292371967
## In.flight.Wifi.Service
## ID -0.023096398
## Age 0.016116216
## Flight.Distance 0.007242013
## Departure.Delay -0.014276165
## Arrival.Delay -0.021503567
## Departure.and.Arrival.Time.Convenience 0.344915181
## Ease.of.Online.Booking 0.714806849
## Check.in.Service 0.043762366
## Online.Boarding 0.457445219
## Gate.Location 0.338573231
## On.board.Service 0.119927683
## Seat.Comfort 0.121513245
## Leg.Room.Service 0.160316959
## Cleanliness 0.131299526
## Food.and.Drink 0.132213872
## In.flight.Service 0.110028554
## In.flight.Wifi.Service 1.000000000
## In.flight.Entertainment 0.207801648
## Baggage.Handling 0.120375901
## In.flight.Entertainment Baggage.Handling
## ID 0.001620438 0.074569268
## Age 0.074946516 -0.047991041
## Flight.Distance 0.120349680 0.061386622
## Departure.Delay -0.010265021 -0.007287156
## Arrival.Delay -0.026413185 -0.024396322
## Departure.and.Arrival.Time.Convenience -0.008380014 0.070833036
## Ease.of.Online.Booking 0.046563505 0.039148282
## Check.in.Service 0.119554033 0.234503128
## Online.Boarding 0.283921540 0.083541489
## Gate.Location 0.002740830 0.000971931
## On.board.Service 0.418573575 0.520295528
## Seat.Comfort 0.611836657 0.074619552
## Leg.Room.Service 0.300397442 0.371454684
## Cleanliness 0.692510538 0.097071490
## Food.and.Drink 0.623460937 0.035320744
## In.flight.Service 0.406093608 0.629237197
## In.flight.Wifi.Service 0.207801648 0.120375901
## In.flight.Entertainment 1.000000000 0.379122757
## Baggage.Handling 0.379122757 1.000000000
# Visualize the correlation matrix
options(repr.plot.width=14, repr.plot.height=12)
corrplot(cor_matrix,
method = "color",
type = "upper",
order = "hclust",
tl.col = "black",
tl.cex = 0.8,
number.cex = 0.7,
addCoef.col = "black",
diag = FALSE)
Lorem Ipsum
# Discover strong correlation
cor_matrix_flat <- as.data.frame(as.table(cor_matrix))
filtered <- subset(cor_matrix_flat, abs(Freq) > 0.5 & Var1 != Var2)
filtered <- filtered[!duplicated(t(apply(filtered[, 1:2], 1, sort))), ]
print(filtered)
## Var1 Var2 Freq
## 131 In.flight.Wifi.Service Ease.of.Online.Booking 0.7148068
## 206 In.flight.Service On.board.Service 0.5515688
## 209 Baggage.Handling On.board.Service 0.5202955
## 223 Cleanliness Seat.Comfort 0.6796130
## 224 Food.and.Drink Seat.Comfort 0.5758462
## 227 In.flight.Entertainment Seat.Comfort 0.6118367
## 262 Food.and.Drink Cleanliness 0.6580539
## 265 In.flight.Entertainment Cleanliness 0.6925105
## 284 In.flight.Entertainment Food.and.Drink 0.6234609
## 304 Baggage.Handling In.flight.Service 0.6292372
From the graph, we can see that there are some variables that have a strong correlation with each other, which is Seat Comfort, Leg Room Service, Cleanliness, Food & Drink, and In-flight Service exhibit strong positive correlations with each other. This suggests that customers who rate one aspect of the service highly tend to rate others positively as well. Furthermore, A high correlation between Departure Delay and Arrival Delay indicates that flights that depart late are also likely to arrive late. Other numerical factors such as Flight Distance may have weaker correlations with customer satisfaction, indicating that distance alone is not a major factor in satisfaction levels.
In-flight Service vs On-board Service (in %)
count_data <- as.data.frame(table(data$In.flight.Service, data$On.board.Service))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "In-flight Service vs On-board Service (in %)",
x = "In-flight Service Rating",
y = "On-board Service Rating",
fill = "Percentage") +
theme_minimal()
The heatmap above shows a strong positive correlation between In-flight Service and On-board Service. This suggests that passengers who rate one aspect of the service highly also tend to rate the other positively. Darker shades indicate higher percentages, with the highest concentration (20.2%) at ( In-flight Service= 4, On-board Service = 4), suggesting a tendency for passengers to rate both aspects positively together.
In-flight Service vs Food and Drink (in %)
count_data <- as.data.frame(table(data$In.flight.Entertainment, data$Food.and.Drink))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "In-flight Entertainment vs Food and Drink (in %)",
x = "In-flight Entertainment Rating",
y = "Food and Drink Rating",
fill = "Percentage") +
theme_minimal()
From the heatmap above, we can see that there is a strong positive correlation between In-flight Entertainment and Food and Drink. he most common rating combination is (4,4), accounting for 16.9% of responses, followed by (5,5) with 14.8%, indicating This suggests that passengers who rate the in-flight entertainment highly also tend to rate the food and drink positively.
In-flight Entertainment vs Cleanliness (in %)
count_data <- as.data.frame(table(data$In.flight.Entertainment, data$Cleanliness))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "In-flight Entertainment vs Cleanliness (in %)",
x = "In-flight Entertainment Rating",
y = "Cleanliness Rating",
fill = "Percentage") +
theme_minimal()
We can see from the heatmap above that there is a positive correlation between In-flight Entertainment and Cleanliness with frequent rating pair is (4,4) at 17.9%, followed by (5,5) at 15.4%,. This suggests that passengers who rate the in-flight entertainment highly also tend to rate the cleanliness positively.
In-flight Entertainment vs Seat Comfort (in %)
count_data <- as.data.frame(table(data$In.flight.Entertainment, data$Seat.Comfort))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "In-flight Entertainment vs Seat Comfort (in %)",
x = "In-flight Entertainment Rating",
y = "Seat Comfort Rating",
fill = "Percentage") +
theme_minimal()
This heatmap depicts a positive correlation between In-flight Entertainment and Seat Comfort. The most common rating combination is (4,4) at 17.9%, followed by (5,5) at 15.1%. This suggests that passengers who rate the in-flight entertainment highly also tend to rate the seat comfort positively.
count_data <- as.data.frame(table(data$Food.and.Drink, data$Cleanliness))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "Food and Drink vs Cleanliness (in %)",
x = "Food and Drink Rating",
y = "Cleanliness Rating",
fill = "Percentage") +
theme_minimal()
Food and Drink vs Seat Comfront (in %)
count_data <- as.data.frame(table(data$Food.and.Drink, data$Seat.Comfort))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "Food and Drink vs Seat Comfort (in %)",
x = "Food and Drink Rating",
y = "Seat Comfort Rating",
fill = "Percentage") +
theme_minimal()
This heatmap illustrates a positive correlation between Food and Drink and Seat Comfort. The most common rating combination is (4,4) at 16%, followed by (5,5) at 13.8%. This suggests that passengers who rate the food and drink highly also tend to rate the seat comfort positively.
Cleanliness vs Seat Comfort (in %)
count_data <- as.data.frame(table(data$Cleanliness, data$Seat.Comfort))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "Cleanliness vs Seat Comfort (in %)",
x = "Cleanliness Rating",
y = "Seat Comfort Rating",
fill = "Percentage") +
theme_minimal()
This heatmap shows a positive correlation between Cleanliness and Seat Comfort. The most common rating combination is (4,4) at 17.9%, followed by (5,5) at 15.1%. This suggests that passengers who rate the cleanliness highly also tend to rate the seat comfort positively.
In-Flight Wifi Service vs Ease of Online Booking (in %)
count_data <- as.data.frame(table(data$In.flight.Wifi.Service, data$Ease.of.Online.Booking))
count_data$Percent <- round(100 * count_data$Freq / sum(count_data$Freq), 1)
# Plot heatmap
ggplot(count_data, aes(x = Var1, y = Var2, fill = Percent)) +
geom_tile(color = "white") +
geom_text(aes(label = paste0(Percent, "%")), color = "black", size = 3) +
scale_fill_gradient(low = "lightyellow", high = "darkred") +
labs(title = "In-Flight Wifi Service vs Ease of Online Booking (in %)",
x = "In-Flight Wifi Service Rating",
y = "Ease of Online Booking Rating",
fill = "Percentage") +
theme_minimal()
This heatmap shows a positive correlation between In-flight Wifi Service and Ease of Online Booking. The most common rating pair is (2,3) at 18.1%, closely followed by (1,2) at 18% and (1,1) at 12.5%, suggesting that many passengers who rated WiFi service poorly also found online booking less convenient. Interestingly, higher ratings for online booking (4-5) are associated with mixed ratings for WiFi, with 8.1% rating both as (5,5). This pattern implies that while passengers may find booking easy, WiFi service still receives moderate to low ratings.
Flight Distance vs Departure Delay
ggplot(data, aes(x = Flight.Distance, y = Departure.Delay)) +
geom_point(alpha = 0.2, color = "steelblue", size = 1.5) +
labs(title = "Flight Distance vs Departure Delay",
x = "Flight Distance",
y = "Departure Delay") +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
From the scatter plot above, we can see that there is a slight positive correlation between Flight Distance and Departure Delay. The red line likely represents a trend or regression line, showing the general pattern in the data. The data appears to be highly discrete, with many flights experiencing common delay times (e.g., 0, 5, 10, 15 minutes). The trend line suggests that departure delays slightly increase as flight distance increases, but the relationship is not strong. There is significant variability in departure delays across all flight distances, meaning other factors likely influence delays beyond just distance..
Flight Distance vs Arrival Delay
ggplot(data, aes(x = Flight.Distance, y = Arrival.Delay)) +
geom_point(alpha = 0.2, color = "steelblue", size = 1.5) +
labs(title = "Flight Distance vs Arrival Delay",
x = "Flight Distance",
y = "Arrival Delay") +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The scatter plot above shows a slight positive correlation between Flight Distance and Arrival Delay.he red line, likely a trend or regression line, suggests that the average arrival delay remains relatively low regardless of flight distance. While there is some variability, the data does not indicate a strong correlation between flight distance and arrival delay.
Lorem ipsum
data_subset <- data %>%
select(Satisfaction, Age, Flight.Distance, Departure.Delay,
Arrival.Delay)
# Convert to long format for faceting
data_long <- pivot_longer(data_subset, -Satisfaction, names_to = "Variable", values_to = "Value")
# Plot boxplots with facets
ggplot(data_long, aes(x = Satisfaction, y = Value, fill = Satisfaction)) +
geom_boxplot() +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(title = "Boxplots of Numeric Variables by Satisfaction",
x = NULL, y = NULL) +
theme(
strip.text = element_text(size = 12),
legend.position = "none"
)
The boxplots compare numeric variables (Age, Arrival Delay, Departure Delay, and Flight Distance) by satisfaction levels (Neutral or Dissatisfied vs. Satisfied). Satisfied customers tend to have higher ages and longer flight distances, as indicated by the higher medians and broader distributions for these groups. Conversely, Arrival and Departure Delays show minimal differences between satisfaction groups, with both distributions being highly skewed and overlapping significantly. This suggests that delays may not strongly influence satisfaction compared to age and flight distance.
data_subset <- data %>%
select(Satisfaction, Ease.of.Online.Booking, Check.in.Service, Online.Boarding,
Gate.Location, On.board.Service, Leg.Room.Service, Flight.Distance)
# Convert to long format
data_long <- pivot_longer(data_subset, -Satisfaction, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Satisfaction)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Satisfaction",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
data_subset <- data %>%
select(Satisfaction, Seat.Comfort, In.flight.Service, Baggage.Handling,
In.flight.Entertainment, In.flight.Wifi.Service, Food.and.Drink, Cleanliness)
# Convert to long format
data_long <- pivot_longer(data_subset, -Satisfaction, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Satisfaction)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Satisfaction",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
The histograms display the distribution of numeric variables grouped by customer satisfaction levels which is Neutral or Dissatisfied (pink) and Satisfied (blue) across various aspects of airline services. In general, higher ratings (closer to 5) for services such as Check-in Service, Ease of Online Booking, Leg Room Service, Online Boarding, and Gate Location correspond more strongly with satisfied customers. Conversely, dissatisfaction is more prevalent at lower ratings (closer to 0 or 1). For Flight Distance, satisfied customers are slightly more common in longer flights. Similarly, variables like Baggage Handling, Cleanliness, Food and Drink, In-flight Entertainment, In-flight Service, In-flight Wi-Fi Service, and Seat Comfort show a similar trend where higher ratings indicate greater satisfaction, while lower ratings are associated with dissatisfaction. These distributions suggest that satisfaction is closely tied to the quality of services provided across these dimensions.
data_subset <- data %>%
select(Class, Age, Flight.Distance, Departure.Delay,
Arrival.Delay)
# Convert to long format for faceting
data_long <- pivot_longer(data_subset, -Class, names_to = "Variable", values_to = "Value")
# Plot boxplots with facets
ggplot(data_long, aes(x = Class, y = Value, fill = Class)) +
geom_boxplot() +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(title = "Boxplots of Numeric Variables by Class",
x = NULL, y = NULL) +
theme(
strip.text = element_text(size = 12),
legend.position = "none"
)
The boxplots compare Age, Arrival Delay, Departure Delay, and Flight Distance across Business, Economy, and Economy Plus classes. Business class passengers are generally older, with a higher median age compared to Economy and Economy Plus, which show similar distributions. Arrival and departure delays have medians near zero across all classes, indicating most flights are on time, but all classes exhibit outliers representing extreme delays. Flight distance is notably higher for Business class, with a greater median and variability compared to the shorter and more consistent distances in Economy and Economy Plus. This suggests Business class is preferred for longer flights and by older passengers, while delays are consistent across all classes.
data_subset <- data %>%
select(Class, Ease.of.Online.Booking, Check.in.Service, Online.Boarding,
Gate.Location, On.board.Service, Leg.Room.Service, Flight.Distance)
# Convert to long format
data_long <- pivot_longer(data_subset, -Class, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Class)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Class",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
data_subset <- data %>%
select(Class, Seat.Comfort, In.flight.Service, Baggage.Handling,
In.flight.Entertainment, In.flight.Wifi.Service, Food.and.Drink, Cleanliness)
# Convert to long format
data_long <- pivot_longer(data_subset, -Class, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Class)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Class",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
The histograms compare numeric variables across flight classes (Business, Economy, and Economy Plus) to assess customer satisfaction. Business class consistently receives higher ratings across most variables, such as Check-in Service, Leg Room Service, Online Boarding, On-board Service, Cleanliness, Food and Drink, In-flight Entertainment, In-flight Wifi Service, and Seat Comfort, highlighting its superior service quality and comfort. Economy Plus generally performs better than Economy but falls short of Business class ratings. Economy class exhibits greater variability in ratings and is often skewed toward lower scores for services like legroom, onboard service, food quality, and seat comfort. Flight Distance shows Business class dominating longer flights, while Economy handles shorter distances. Variables like Gate Location and Ease of Online Booking show relatively even distributions across classes. Overall, the data emphasizes the premium experience associated with Business class compared to Economy and Economy Plus.
data_subset <- data %>%
select(Type.of.Travel, Age, Flight.Distance, Departure.Delay,
Arrival.Delay)
# Convert to long format for faceting
data_long <- pivot_longer(data_subset, -Type.of.Travel, names_to = "Variable", values_to = "Value")
# Plot boxplots with facets
ggplot(data_long, aes(x = Type.of.Travel, y = Value, fill = Type.of.Travel)) +
geom_boxplot() +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(title = "Boxplots of Numeric Variables by Type of Travel",
x = NULL, y = NULL) +
theme(
strip.text = element_text(size = 12),
legend.position = "none"
)
The boxplots illustrate the distribution of four numeric variables,Age, Arrival Delay, Departure Delay, and Flight Distance grouped by the type of travel (Business or Personal). Business travelers tend to be slightly older on average than personal travelers, with a narrower age range. Both groups experience similar patterns in arrival and departure delays, though delays for personal travel show more variability and outliers. Flight distance varies significantly between the two groups, with business travelers generally covering shorter distances compared to personal travelers, which exhibit a wider range and more outliers.
data_subset <- data %>%
select(Type.of.Travel, Ease.of.Online.Booking, Check.in.Service, Online.Boarding,
Gate.Location, On.board.Service, Leg.Room.Service, Flight.Distance)
# Convert to long format
data_long <- pivot_longer(data_subset, -Type.of.Travel, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Type.of.Travel)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Type of Travel",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
data_subset <- data %>%
select(Type.of.Travel, Seat.Comfort, In.flight.Service, Baggage.Handling,
In.flight.Entertainment, In.flight.Wifi.Service, Food.and.Drink, Cleanliness)
# Convert to long format
data_long <- pivot_longer(data_subset, -Type.of.Travel, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Type.of.Travel)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Type of Travel",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
The histograms illustrate, Across most variables, Business travelers dominate the counts, particularly for higher ratings (4 and 5), indicating greater satisfaction. In contrast, Personal travelers are fewer and show a more even distribution across lower and higher ratings. Flight Distance shows a distinct peak for shorter distances, with Business travelers being more frequent. Overall, Business travelers tend to rate services higher than Personal travelers.
data_subset <- data %>%
select(Gender, Age, Flight.Distance, Departure.Delay,
Arrival.Delay)
# Convert to long format for faceting
data_long <- pivot_longer(data_subset, -Gender, names_to = "Variable", values_to = "Value")
# Plot boxplots with facets
ggplot(data_long, aes(x = Gender, y = Value, fill = Gender)) +
geom_boxplot() +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(title = "Boxplots of Numeric Variables by Gender",
x = NULL, y = NULL) +
theme(
strip.text = element_text(size = 12),
legend.position = "none"
)
The boxplots compare four numeric variables of Age, Arrival Delay, Departure Delay, and Flight Distance by gender (Female and Male). For Age, males have a slightly higher median than females, with similar interquartile ranges (IQRs) and no visible outliers. Arrival Delay and Departure Delay show comparable distributions for both genders, with medians near zero, narrow IQRs, and numerous outliers indicating extreme delays. Flight Distance reveals a slightly higher median and wider variability among males compared to females, with some outliers in both groups representing unusually long flights. Overall, the distributions of delays are similar across genders, while Flight Distance shows more variability for males.
data_subset <- data %>%
select(Gender, Ease.of.Online.Booking, Check.in.Service, Online.Boarding,
Gate.Location, On.board.Service, Leg.Room.Service, Flight.Distance)
# Convert to long format
data_long <- pivot_longer(data_subset, -Gender, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Gender)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Gender",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
data_subset <- data %>%
select(Gender, Seat.Comfort, In.flight.Service, Baggage.Handling,
In.flight.Entertainment, In.flight.Wifi.Service, Food.and.Drink, Cleanliness)
# Convert to long format
data_long <- pivot_longer(data_subset, -Gender, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Gender)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Gender",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
The histograms illustrate the distribution of ratings for various airline service attributes (Baggage Handling, Cleanliness, Food and Drink, In-flight Entertainment, In-flight Service, In-flight WiFi Service, and Seat Comfort) segmented by gender (Female and Male). Across most categories, ratings of 4 and 5 are predominant, indicating overall positive feedback. Both genders exhibit similar patterns in their evaluations, with slight variations in proportions. For example, females appear to rate slightly higher in categories like In-flight Entertainment and Cleanliness compared to males. Conversely, males seem to have a marginally higher count in lower ratings (1–3) for some categories like Seat Comfort and In-flight WiFi Service. This suggests gender-based nuances in satisfaction levels but overall consistency in favoring higher ratings across services
data_subset <- data %>%
select(Customer.Type, Age, Flight.Distance, Departure.Delay,
Arrival.Delay)
# Convert to long format for faceting
data_long <- pivot_longer(data_subset, -Customer.Type, names_to = "Variable", values_to = "Value")
# Plot boxplots with facets
ggplot(data_long, aes(x = Customer.Type, y = Value, fill = Customer.Type)) +
geom_boxplot() +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(title = "Boxplots of Numeric Variables by Customer Type",
x = NULL, y = NULL) +
theme(
strip.text = element_text(size = 12),
legend.position = "none"
)
The boxplots compare numeric variables (Age, Arrival Delay, Departure Delay, and Flight Distance) between two customer types: First-time and Returning. For Age, returning customers tend to be older, with a higher median and wider range compared to first-time customers. Arrival Delay and Departure Delay show similar distributions for both groups, with most values concentrated around zero and a significant number of outliers indicating occasional long delays. For Flight Distance, returning customers generally travel longer distances, as evidenced by a higher median and greater variability compared to first-time customers. Overall, returning customers exhibit distinct patterns in age and flight distance, while delays are similarly distributed across both groups.
data_subset <- data %>%
select(Customer.Type, Ease.of.Online.Booking, Check.in.Service, Online.Boarding,
Gate.Location, On.board.Service, Leg.Room.Service, Flight.Distance)
# Convert to long format
data_long <- pivot_longer(data_subset, -Customer.Type, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Customer.Type)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Customer Type",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
data_subset <- data %>%
select(Customer.Type, Seat.Comfort, In.flight.Service, Baggage.Handling,
In.flight.Entertainment, In.flight.Wifi.Service, Food.and.Drink, Cleanliness)
# Convert to long format
data_long <- pivot_longer(data_subset, -Customer.Type, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Customer.Type)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of Numeric Variables by Customer Type",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
The two sets of histograms display the distribution of numeric variables related to customer satisfaction across two customer types: “First-time” and “Returning.” Each variable, such as “Check.in.Service,” “Flight.Distance,” “Leg.Room.Service,” and others, is rated on a scale (likely 0–5), and the counts for each rating are shown separately for the two customer types, with “Returning” customers represented in teal and “First-time” customers in pink. Across most variables, returning customers consistently dominate the counts, suggesting they are more frequent or engaged with the airline. Ratings of 4 and 5 tend to have higher counts for both groups in service-related categories like “Ease.of.Online.Booking,” “Leg.Room.Service,” and “Seat.Comfort,” indicating general satisfaction. However, some variables like “Gate.Location” and “In.flight.Wifi.Service” show substantial variability, suggesting room for improvement in these areas. Overall, the histograms provide insights into customer preferences and satisfaction trends based on experience type.
Recalling the Univariate Data Analysis, we will use five categories or variables that we deemed important as the base for further analysis, specifically to see the association between two categorical variables. These variables are distribution of gender, customer type, type of travel, class, and the disparity of numbers between satisfacted and unsatisfied passengers.
plot_bar(data,ncol=2)
We will use the plot_bar function to visualize the association between
these variables. For each of the following five sections, we take out
one of the category and plot the association between each of the
remaining four categories with the missing category. We start with
creating the association between customer’s satisfaction with the other
four variables.
plot_bar(data,by='Satisfaction',ncol=2)
From the univariate data distribution, we already know the numbers of satisfied and unsatisfied passengers is not equal with noticeable difference, with the majority of passengers being unsatisfied. So it is not surprising that regardless of the association between passengers’ satisfaction with other categories, most graphs are showing the skew towards the neutral or dissatisfied category.
However, it could be noticed that the distribution of satisfaction level across both genders are relatively equal, while the first time flyers are more likely to be unsatisfied rather than the frequent travelers. Business travelers also tend to be more satisfied than the economy class passengers. This is also the case for the business class passengers. We will see in the other section that passengers with business intention are more likely to fly on business class, so the correlation is consistent in the satisfaction level as well. This is understandable, since airlines with business class are more likely to have extra services and better quality of basic services, which means that it is more likely to satisfy the passengers. (belum tau info ini mending dimasukin di sini atau bagian data analysis). Hence, the focus on improving the satisfaction rate of passengers should be on the economy class passengers with leisure intention, especially those who are first time flyers.
plot_bar(data,by='Gender',ncol=2)
In this section, we show the distribution of genders in four other categories, which is relatively balanced across all graphs. Hence, we can conclude that genders are not the deciding factor that we need to analyze further in the data analysis section, meaning that it is not important enough as a consideration point to improve the satisfaction rate.
plot_bar(data,by='Customer.Type',ncol=2)
From the customer type distribution and association graphs, it can be noticed that the majority of first time passengers are flying economy class, and also more likely to have the less-favorable first imnpression towards their flying experience, which can be seen in its relation with satisfaction level.
plot_bar(data,by='Type.of.Travel',ncol=2)
As we can see above, the majority of business travelers fly on the business class as well, and they are also more likely to be satisfied with their flying experience. This is consistent with our observation and discussion on the previous section. On the oppositem, travelers with personal reason, such as leisure, are more likely to fly on both of the economy classes and also less likely to be satisfied with their flying experience.
plot_bar(data,by='Class',ncol=2)
Rencana hapus section ini karena dirasa ga nambah observasi apapun, TBD
We noticed that the age distribution have high variation, therefore we try to create the age_group column to divide the age group into four categories: children, youth, adult, and senior. This categorization is based on most of the division range used in a lot of transportation services across Europe. The age group is defined and plotted in R as follows:
data = data %>% mutate(Age_Range = if_else(Age <= 15,"Children",
if_else(16 <= Age & Age <= 25,"Youth",
if_else(26 < Age & Age <= 60,"Adult","Senior"))))
plot_bar(data$Age_Range,ncol=2,order_bar=T,ggtheme = theme_minimal())
Now we can interpret from the age group that the largest segment of our customer is belong to adult age group. This is understandable since the adult group has the widest range of age compared to other age groups.
Another data we can group is the flight distance which currently right skewed. We will create new variables of flight distance by grouping them into three categories: short-haul, medium-haul, and long-haul. The division of distance is based on the limit defined by David W. Wragg in his book A Dictionary of Aviation:
Short-haul Flights: 0 - 900 miles (0 - 1,450 km)
Medium-haul Flights: 900 - 2,200 miles (1,450 - 3,540 km)
Long-haul Flights: 2,200 miles (3,540 km) and above
data = data %>% mutate(Distance_Group = if_else(Flight.Distance < 900,"Short-haul",
if_else(900 <= Flight.Distance & Flight.Distance <= 2200,"Medium-haul","Long-haul")))
plot_bar(data$Distance_Group, ncol = 2, order_bar = TRUE, ggtheme = theme_minimal())
We can divide which customer experiencing delay or not based on departure delay and arrival delay. United States’ Federal Aviation Administration (FAA) defines a flight as delayed if it is 15 minutes or more behind schedule. We can use the following code to create the new variables to show the comparison of the numbers of flight with departure delay and arrival delay.
data$departure.delay.status = ifelse(data$Departure.Delay > 15, "Delayed", "On Time")
data$arrival.delay.status = ifelse(data$Arrival.Delay > 15, "Delayed", "On Time")
plot_bar(data$departure.delay.status,ncol=2,order_bar=T,ggtheme = theme_minimal())
plot_bar(data$arrival.delay.status,ncol=2,order_bar=T,ggtheme = theme_minimal())
Because both departure and arrival delay status are equally important, we can combine them into one variable called delay.status with OR operator is used to display the total number of delayed flights.
data$delay.status = ifelse(data$Departure.Delay > 15 | data$Arrival.Delay > 15, "Delayed", "On-time")
plot_bar(data$delay.status,ncol=2,order_bar=T,ggtheme = theme_minimal())
Next, We group the features into category of services that could be improved by the airline:
Flight General Service: On-board Service, Food and Drink, In-flight Service
Flight Entertainment: In Flight Wifi Service, In Flight Entertainment
Pre-flight Service: Ease of Online Booking, Check-in Service, Online Boarding
Comfortability: Seat Comfort, Leg Room Service, Cleanliness
data$In.flight.service = rowSums(data[,c('On.board.Service','Food.and.Drink','In.flight.Service')],na.rm = TRUE)
data$Flight.Entertainment = rowSums(data[,c('In.flight.Wifi.Service','In.flight.Entertainment')],na.rm = TRUE)
data$Pre.flight.service = rowSums(data[,c('Ease.of.Online.Booking','Check.in.Service','Online.Boarding')],na.rm = TRUE)
data$Comfortability = rowSums(data[,c('Seat.Comfort','Leg.Room.Service','Cleanliness')],na.rm = TRUE)
plot_histogram(data[ , c("In.flight.service", "Flight.Entertainment", "Pre.flight.service", "Comfortability")], ncol=2,ggtheme = theme_minimal())
Combining several category of services into one variable is useful to see the overall satisfaction of the airline services in numerous area of improvement. For example, as we can see above that in general the passengers found that the flights they are taking are relatively comfortable. However, the pre-flight service and flight entertainment category received mixed reviews and can be considered as an improvement area for the airline, where further analysis could be done to see which of the element of those services are the most dissatisfying for the passengers.
DARI FEATURE ENGINEERING KE SECTION SELANJUTNYA HARUS CROSSCHECK, BEBERAPA VARIABEL BARU BERUBAH RANGE DAN PENAMAANNYA
Data transformation plays a key role to make sure all data is in the same scale and distribution. we will start to take a look on our data distribution.
library(ggplot2)
library(DataExplorer)
# Set default colors for histogram bars
update_geom_defaults("bar", list(fill = "steelblue", color = NA))
# Generate histogram with updated colors
plot_histogram(data,
ncol = 2,
ggtheme = theme_minimal() + theme(strip.background = element_blank(),
strip.text = element_text(color = "black")))
In the following step we will normalize the data by applying log transformation to the concerning variables.
data$Departure.Delay.Duration = ifelse(data$Departure.Delay > 0, data$Departure.Delay, NA)
data$Arrival.Delay.Duration = ifelse(data$Arrival.Delay > 0, data$Arrival.Delay, NA)
data$Flight.Distance.log = log(1 + data$Flight.Distance)
data$Age.log = log(1 + data$Age)
data$Departure.Delay.Duration.log = log(1 + data$Departure.Delay.Duration)
data$Arrival.Delay.Duration.log = log(1 + data$Arrival.Delay.Duration)
plot_histogram(data[, c("Flight.Distance.log", "Age.log", "Departure.Delay.Duration.log", "Arrival.Delay.Duration.log")], ncol=2,ggtheme = theme_minimal())
After log-transormation, normalize the distribution is then performed as follow.
data$Flight.Distance.log.z = scale(data$Flight.Distance.log)
data$Age.log.z = scale(data$Age.log)
data$Departure.Delay.Duration.log.z = scale(data$Departure.Delay.Duration.log)
data$Arrival.Delay.Duration.log.z = scale(data$Arrival.Delay.Duration.log)
plot_histogram(data[, c("Flight.Distance.log.z", "Age.log.z", "Departure.Delay.Duration.log.z", "Arrival.Delay.Duration.log.z")], ncol=2,ggtheme = theme_minimal())
As we can see in the histogram, the data is now normalized with 0 as the center. This is important to ensure that the data is on the same scale and distribution. This helps the model learn the data better and make better predictions.
Here are the reasons why performing normalization is important before using the data:Our target is the satisfaction rate of the airlines services, therefore we need to identify how each variable are correlated to the satisfaction. This is important to understand how each variable’s behavior toward main objective of the analysis. Following is the behavior of each variables towards satisfaction rate.
plot_bar(data,by='Satisfaction',ncol=2)
data_subset <- data %>%
select(Satisfaction, In.flight.service, Flight.Entertainment, Pre.flight.service, Comfortability)
# Convert to long format
data_long <- pivot_longer(data_subset, -Satisfaction, names_to = "Variable", values_to = "Value")
# Create faceted histogram
ggplot(data_long, aes(x = Value, fill = Satisfaction)) +
geom_histogram(position = "stack", alpha = 0.5, bins = 30) +
facet_wrap(~ Variable, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Histograms of New Numeric Variables by Satisfaction",
x = NULL, y = "Count") +
theme(
strip.text = element_text(size = 12),
legend.position = "bottom"
)
Graphic above is telling the satisfaction distribution based on services category such as comfort, digital, entertainment, and flight services. We can see that the distribution of satisfaction is not significantly different between each category.
data %>% group_by(Class, Satisfaction) %>% summarize(num=n()) %>% mutate(percentage=round(num*100/sum(num),2))
## `summarise()` has grouped output by 'Class'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## # Groups: Class [3]
## Class Satisfaction num percentage
## <chr> <chr> <int> <dbl>
## 1 Business Neutral or Dissatisfied 18994 30.6
## 2 Business Satisfied 43166 69.4
## 3 Economy Neutral or Dissatisfied 47366 81.2
## 4 Economy Satisfied 10943 18.8
## 5 Economy Plus Neutral or Dissatisfied 7092 75.4
## 6 Economy Plus Satisfied 2319 24.6
Based on the data above, 69.44% of Business passengers showing satisfaction, while only 18.77% of Economy passengers showing satisfaction and only 24.64% of Economy Plus passengers are satisfied. This indicates that the airline services are more satisfying for business class passengers compared to economy class passengers. This data highlights the need for airlines to focus on enhancing the customer experience, particularly in Economy and Economy Plus, to achieve better satisfaction rates.
data %>% group_by(Type.of.Travel, Satisfaction) %>%
summarize(num=n()) %>%
mutate(percentage=round(num*100/sum(num),2))
## `summarise()` has grouped output by 'Type.of.Travel'. You can override using
## the `.groups` argument.
## # A tibble: 4 × 4
## # Groups: Type.of.Travel [2]
## Type.of.Travel Satisfaction num percentage
## <chr> <chr> <int> <dbl>
## 1 Business Neutral or Dissatisfied 37337 41.6
## 2 Business Satisfied 52356 58.4
## 3 Personal Neutral or Dissatisfied 36115 89.9
## 4 Personal Satisfied 4072 10.1
Personal type of travel passenger is having a third of airline total customer with the number of satisfied customer is only 10% from it. This is much lower compared to the business type customers with 58.37% satisfaction rate. Personal travel satisfaction levels indicate areas for improvement in service offerings or customer experience. Therefore it is also important customer segment to be improved trying to understand more satisfaction triggers from personal type of travelers.
data %>% group_by(Customer.Type, Satisfaction) %>%
summarize(num=n()) %>%
mutate(percentage=round(num*100/sum(num),2))
## `summarise()` has grouped output by 'Customer.Type'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 4
## # Groups: Customer.Type [2]
## Customer.Type Satisfaction num percentage
## <chr> <chr> <int> <dbl>
## 1 First-time Neutral or Dissatisfied 18080 76.0
## 2 First-time Satisfied 5700 24.0
## 3 Returning Neutral or Dissatisfied 55372 52.2
## 4 Returning Satisfied 50728 47.8
First-time Customers: A majority, 76.03%, are
neutral or dissatisfied, while only 23.97% report being satisfied.This
could be a concern since it is indicating a risk from airline for not
retaining first-time customers.
Returning
Customers: A majority, 58.37%, are satisfied, while 41.63% are
neutral or dissatisfied. This indicates that returning customers are
more satisfied with the airline services compared to first-time
customers. This data highlights the need for airlines to focus on
enhancing the customer experience, particularly for first-time
customers, to achieve better satisfaction rates.
data %>% filter(Customer.Type == 'First-time') %>% group_by(Class ,Satisfaction) %>%
summarize(num=n()) %>%
mutate(percentage=round(num*100/sum(num),2))
## `summarise()` has grouped output by 'Class'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 4
## # Groups: Class [3]
## Class Satisfaction num percentage
## <chr> <chr> <int> <dbl>
## 1 Business Neutral or Dissatisfied 5569 60.3
## 2 Business Satisfied 3662 39.7
## 3 Economy Neutral or Dissatisfied 11669 85.6
## 4 Economy Satisfied 1965 14.4
## 5 Economy Plus Neutral or Dissatisfied 842 92.0
## 6 Economy Plus Satisfied 73 7.98
The table shows significant differences in first-timer passenger
satisfaction across flight classes. Business Class shows the highest
satisfaction, with nearly 40% satisfied, though more than half remain
neutral or dissatisfied. Economy Class fares worse, with 85% of
passengers neutral or dissatisfied, and Economy Plus performs the
poorest, with 92% dissatisfaction. Enhancing services for Economy and
Economy Plus could greatly improve overall customer satisfaction.
In conclusion, mostly first-timer passengers feeling dissatisfied
on each classes with notably higher in economy and economy plus class.
This is a concern for the airline, as it indicates a need to improve
services for first-time customers, to retain them and increase
satisfaction rates.
data %>% group_by(Distance_Group) %>%
summarize(num=n()) %>%
mutate(percentage=round(num*100/sum(num),2))
## # A tibble: 3 × 3
## Distance_Group num percentage
## <chr> <int> <dbl>
## 1 Long-haul 20631 15.9
## 2 Medium-haul 37679 29.0
## 3 Short-haul 71570 55.1
Previously in the barplot satisfaction, we can notice that as the
distance lower the rate of disatisfied customer is increasing.
Below 1000 distance is having the highest dissatisfaction
rate. From the data above, this condition could be a big
concern since the majority of the airline customer is having journey
below 1000 (more than 60%). This is a concern for the airline, as it
indicates a need to improve services for short-distance travelers, to
retain them and increase satisfaction rates.
Next we will try to
analyze the service factors related to the customer satisfaction,
starting with customer type :
# Compute mean for each continuous column
filtered_data <- data %>% filter(Customer.Type == 'First-time')
filtered_data <- subset(filtered_data, select = -c(ID, Flight.Distance, Age, Arrival.Delay, Departure.Delay, Arrival.Delay.Duration, Arrival.Delay.Duration.log,
Arrival.Delay.Duration.log.z, Departure.Delay.Duration, Departure.Delay.Duration.log, Departure.Delay.Duration.log.z,
Flight.Distance.log, Flight.Distance.log.z, Age.log, Age.log.z))
mean_values <- sapply(filtered_data, function(x) if(is.numeric(x)) mean(x, na.rm = TRUE) else NA)
mean_values <- na.omit(mean_values)
mean_df <- data.frame(Variable = names(mean_values), Mean = mean_values)
# Remove NA values (non-numeric columns)
# Sort in descending order
mean_df <- mean_df[order(mean_df$Mean, decreasing = TRUE), ]
# Plot bar chart with highest value on top and number labels
ggplot(mean_df, aes(x = reorder(Variable, Mean), y = Mean)) +
geom_bar(stat = "identity", fill = "#CFCFFF") +
geom_text(aes(label = round(Mean, 2)), hjust = -0.2) +
coord_flip() +
theme_minimal() +
labs(title = "Mean of Continuous Variables", x = "Variable", y = "Mean Value")
We can identify that from the first time customer, the services that
dissapointing them most in Departure and Arrival Time
Convenience, Ease of Online Booking, and Wifi Service.While
Comfort Service having the highest satisfaction value.
This data can guide strategic improvements to enhance the
first-time customer experience and increase satisfaction rates by
focusing the improvement on those concerning satisfaction factors with
the low score.
# Compute mean for each continuous column
filtered_data <- data %>% filter(Type.of.Travel == 'Business')
filtered_data <- subset(filtered_data, select = -c(ID, Flight.Distance, Age, Arrival.Delay, Departure.Delay, Arrival.Delay.Duration, Arrival.Delay.Duration.log,
Arrival.Delay.Duration.log.z, Departure.Delay.Duration, Departure.Delay.Duration.log, Departure.Delay.Duration.log.z,
Flight.Distance.log, Flight.Distance.log.z, Age.log, Age.log.z))
mean_values <- sapply(filtered_data, function(x) if(is.numeric(x)) mean(x, na.rm = TRUE) else NA)
mean_values <- na.omit(mean_values)
mean_df <- data.frame(Variable = names(mean_values), Mean = mean_values)
# Remove NA values (non-numeric columns)
# Sort in descending order
mean_df <- mean_df[order(mean_df$Mean, decreasing = TRUE), ]
# Plot bar chart with highest value on top and number labels
ggplot(mean_df, aes(x = reorder(Variable, Mean), y = Mean)) +
geom_bar(stat = "identity", fill = "#008080") +
geom_text(aes(label = round(Mean, 2)), hjust = -0.2) +
coord_flip() +
theme_minimal() +
labs(title = "Mean of Continuous Variables", x = "Variable", y = "Mean Value")
We can identify that from the Personal type of travel customers, the
services that dissapointing them most in Departure and Arrival
Time Convenience, In flight wifi service, Ease of Online
Booking.These areas could be prioritized for enhancement to
improve overall satisfaction for business travelers. The top
satisfaction factors is Comfort service.
As you can
see, personal type of travel customer satisfaction data is showing
similar behaviour with the first-timer passengers. Thus, improving
low-rated areas like WiFi availability and online booking ease may help
elevate the overall customer experience for this segment.
# Compute mean for each continuous column
filtered_data <- data %>% filter(Class == 'Economy')
filtered_data <- subset(filtered_data, select = -c(ID, Flight.Distance, Age, Arrival.Delay, Departure.Delay, Arrival.Delay.Duration, Arrival.Delay.Duration.log,
Arrival.Delay.Duration.log.z, Departure.Delay.Duration, Departure.Delay.Duration.log, Departure.Delay.Duration.log.z,
Flight.Distance.log, Flight.Distance.log.z, Age.log, Age.log.z))
mean_values <- sapply(filtered_data, function(x) if(is.numeric(x)) mean(x, na.rm = TRUE) else NA)
mean_values <- na.omit(mean_values)
mean_df <- data.frame(Variable = names(mean_values), Mean = mean_values)
# Remove NA values (non-numeric columns)
# Sort in descending order
mean_df <- mean_df[order(mean_df$Mean, decreasing = TRUE), ]
# Plot bar chart with highest value on top and number labels
ggplot(mean_df, aes(x = reorder(Variable, Mean), y = Mean)) +
geom_bar(stat = "identity", fill = "#DC143C") +
geom_text(aes(label = round(Mean, 2)), hjust = -0.2) +
coord_flip() +
theme_minimal() +
labs(title = "Mean of Continuous Variables", x = "Variable", y = "Mean Value")
The graph above is satisfaction data from Economy class customers. The biggest contributors to low satisfaction rate are ease of online booking, wifi service, and online boarding. While Comfort Service have the highest satisfaction rate among the Economy class customers. This data can guide strategic improvements to enhance the Economy class customer experience and increase satisfaction rates by focusing the improvement on those concerning satisfaction factors with the low score.
Lorem Ipsum
PCA can be used to reduce the dimensionality of the dataset while ensuring that we dont lose too much information
PCA with Numeric Variable Only
# Use the standardized data
data_standardized <- data %>%
select(-c(ID, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z, Flight.Distance.log,
Flight.Distance.log.z,Departure.Delay.Duration, Arrival.Delay.Duration,
Age.log, Age.log.z)) %>%
select(where(is.numeric)) %>%
scale(center = TRUE, scale = TRUE)
results <- prcomp(data_standardized)
summary(results)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.500 1.7009 1.5378 1.214 1.20323 1.03115 1.00626
## Proportion of Variance 0.284 0.1315 0.1075 0.067 0.06581 0.04833 0.04603
## Cumulative Proportion 0.284 0.4155 0.5230 0.590 0.65581 0.70415 0.75017
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.96207 0.93256 0.73985 0.73395 0.72384 0.70377 0.65400
## Proportion of Variance 0.04207 0.03953 0.02488 0.02449 0.02382 0.02251 0.01944
## Cumulative Proportion 0.79224 0.83177 0.85665 0.88114 0.90496 0.92747 0.94691
## PC15 PC16 PC17 PC18 PC19 PC20
## Standard deviation 0.60816 0.54522 0.53207 0.4666 1.468e-14 8.693e-15
## Proportion of Variance 0.01681 0.01351 0.01287 0.0099 0.000e+00 0.000e+00
## Cumulative Proportion 0.96372 0.97723 0.99010 1.0000 1.000e+00 1.000e+00
## PC21 PC22
## Standard deviation 7.497e-15 6.465e-15
## Proportion of Variance 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00
Cummulative value already achieved 90% using 12 Principal Components as below
results$rotation[ , c(1:2) ]
## PC1 PC2
## Age -0.047872121 0.0336385640
## Flight.Distance -0.078332845 -0.0018773705
## Departure.Delay 0.006573192 -0.0033750811
## Arrival.Delay 0.018111380 0.0007851822
## Departure.and.Arrival.Time.Convenience -0.077717374 0.3150621228
## Ease.of.Online.Booking -0.145668271 0.4799177326
## Check.in.Service -0.140699313 0.0035210646
## Online.Boarding -0.230789748 0.2130984022
## Gate.Location -0.053614742 0.3229454746
## On.board.Service -0.211466635 -0.1400659693
## Seat.Comfort -0.266569667 -0.1141431987
## Leg.Room.Service -0.177631249 -0.0781383076
## Cleanliness -0.272995212 -0.1451299221
## Food.and.Drink -0.237436436 -0.1385787618
## In.flight.Service -0.197822341 -0.1475750486
## In.flight.Wifi.Service -0.202622660 0.4003442397
## In.flight.Entertainment -0.328215094 -0.1898314416
## Baggage.Handling -0.186841650 -0.1266896907
## In.flight.service -0.315027598 -0.2065932259
## Flight.Entertainment -0.341688353 0.1347743608
## Pre.flight.service -0.250851327 0.3487749068
## Comfortability -0.328679031 -0.1546031847
To focus on variable with high loadings, we set the threshold of 0.2 and only select the first principal component.
loadings <- data.frame(results$rotation)
threshold <- 0.20
indices <- abs(loadings$PC1) >= threshold
data.frame(variable = rownames(loadings)[indices],
loading = loadings$PC1[indices])
## variable loading
## 1 Online.Boarding -0.2307897
## 2 On.board.Service -0.2114666
## 3 Seat.Comfort -0.2665697
## 4 Cleanliness -0.2729952
## 5 Food.and.Drink -0.2374364
## 6 In.flight.Wifi.Service -0.2026227
## 7 In.flight.Entertainment -0.3282151
## 8 In.flight.service -0.3150276
## 9 Flight.Entertainment -0.3416884
## 10 Pre.flight.service -0.2508513
## 11 Comfortability -0.3286790
Lorem Ipsum
loadings_df <- data.frame(
variable = rownames(loadings)[indices],
loading = loadings$PC1[indices]
)
# Plot
ggplot(loadings_df, aes(x = reorder(variable, -loading), y = loading, fill = loading > 0)) +
geom_col(show.legend = FALSE) +
coord_flip() +
labs(title = "Top Loadings for PC1",
x = "Variable",
y = "Loading Value") +
scale_fill_manual(values = c("steelblue", "tomato")) +
theme_minimal()
Analyzing top contributor in PC1-PC12
loadings <- as.data.frame(results$rotation)
loadings$Variable <- rownames(loadings)
# Top contributors for PC1–PC12
top_vars <- apply(abs(loadings[, 1:12]), 2, function(x) names(sort(x, decreasing = TRUE)[1:5]))
print(top_vars)
## PC1 PC2
## [1,] "Flight.Entertainment" "Ease.of.Online.Booking"
## [2,] "Comfortability" "In.flight.Wifi.Service"
## [3,] "In.flight.Entertainment" "Pre.flight.service"
## [4,] "In.flight.service" "Gate.Location"
## [5,] "Cleanliness" "Departure.and.Arrival.Time.Convenience"
## PC3 PC4 PC5
## [1,] "In.flight.Service" "Check.in.Service" "Departure.Delay"
## [2,] "Baggage.Handling" "Online.Boarding" "Arrival.Delay"
## [3,] "On.board.Service" "Pre.flight.service" "Check.in.Service"
## [4,] "Seat.Comfort" "Gate.Location" "Flight.Distance"
## [5,] "Cleanliness" "Arrival.Delay" "Pre.flight.service"
## PC6 PC7
## [1,] "Check.in.Service" "Departure.and.Arrival.Time.Convenience"
## [2,] "Flight.Distance" "Gate.Location"
## [3,] "Age" "Age"
## [4,] "Leg.Room.Service" "In.flight.Wifi.Service"
## [5,] "Pre.flight.service" "Online.Boarding"
## PC8 PC9
## [1,] "Age" "Leg.Room.Service"
## [2,] "Flight.Distance" "Flight.Distance"
## [3,] "Leg.Room.Service" "Comfortability"
## [4,] "Comfortability" "In.flight.service"
## [5,] "On.board.Service" "Food.and.Drink"
## PC10
## [1,] "Arrival.Delay"
## [2,] "Departure.Delay"
## [3,] "Gate.Location"
## [4,] "Departure.and.Arrival.Time.Convenience"
## [5,] "Flight.Distance"
## PC11
## [1,] "Departure.and.Arrival.Time.Convenience"
## [2,] "Gate.Location"
## [3,] "Baggage.Handling"
## [4,] "On.board.Service"
## [5,] "Leg.Room.Service"
## PC12
## [1,] "Baggage.Handling"
## [2,] "Departure.and.Arrival.Time.Convenience"
## [3,] "Food.and.Drink"
## [4,] "On.board.Service"
## [5,] "Check.in.Service"
PC1 – In-Flight Amenities & Comfort Driven by Flight.Entertainment, Comfortability, In.flight.Entertainment, In.flight.service, and Cleanliness. This component captures the overall in-flight experience focused on comfort and entertainment.
PC2 – Digital & Booking Experience Influenced by Ease.of.Online.Booking, Pre.flight.service, and In.flight.Wifi.Service. It represents passengers’ interaction with digital services before and during the flight.
PC3 – Operational Onboard Service Dominated by In.flight.Service, Baggage.Handling, and Seat.Comfort, this component reflects service reliability and seat-related comfort during the flight.
PC4 – Flight Timing & Boarding Characterized by Arrival.Delay, Departure.Delay, and Online.Boarding. It relates to punctuality and boarding process efficiency.
PC5 – Passenger Profile & Journey Shaped by Online.Boarding, Flight.Distance, Age, and Gate.Location, this component reflects demographic and journey-specific traits.
PC6 – Schedule & Gate Convenience Led by Departure.and.Arrival.Time.Convenience, Gate.Location, and Age, this component captures how convenient flight times and gate accessibility are for passengers.
PC7 – Pre-Flight & Cabin Space Defined by Check.in.Service, Leg.Room.Service, and Flight.Distance, combining service before boarding with cabin comfort.
PC8 – Age-Based Travel Patterns Mainly driven by Age, Flight.Distance, and Check.in.Service, this component shows how age may influence travel experience and expectations.
PC9 – Comfort in Long Trips Includes Leg.Room.Service, Flight.Distance, and Comfortability, highlighting comfort needs for longer or more frequent flights.
PC10 – In-Flight Consumables Composed of On.board.Service, In.flight.service, and Food.and.Drink, focusing on passengers’ satisfaction with in-flight consumables.
PC11 – Delays & Terminal Logistics Influenced by Arrival.Delay, Departure.Delay, and Gate.Location, this component represents disruptions tied to flight and gate logistics.
PC12 – Check-In & Time Coordination Driven by Departure.and.Arrival.Time.Convenience and Check.in.Service, highlighting the efficiency of early-stage travel processes.
screeplot(results, type = "lines")
Lorem Ipsum
PCA with One Hot Encoding
library(caret)
## Loading required package: lattice
# Use the standardized data
data_standardized <- data %>%
select(-c(ID, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z, Flight.Distance.log,
Flight.Distance.log.z,Departure.Delay.Duration, Arrival.Delay.Duration,
Age.log, Age.log.z))
dummies <- dummyVars(~ ., data = data_standardized)
data_encoded <- predict(dummies, newdata = data_standardized)
data_encoded_standardized <- scale(data_encoded, center = TRUE, scale = TRUE)
results <- prcomp(data_encoded_standardized)
summary(results)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.8194 2.3912 1.89722 1.74782 1.69078 1.55506 1.43131
## Proportion of Variance 0.1728 0.1243 0.07825 0.06641 0.06215 0.05257 0.04454
## Cumulative Proportion 0.1728 0.2971 0.37535 0.44176 0.50391 0.55648 0.60101
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.40361 1.31121 1.22660 1.19899 1.17702 1.11575 1.05445
## Proportion of Variance 0.04283 0.03738 0.03271 0.03125 0.03012 0.02706 0.02417
## Cumulative Proportion 0.64384 0.68122 0.71393 0.74518 0.77529 0.80236 0.82653
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 1.00270 0.98245 0.94601 0.8525 0.72984 0.71330 0.69079
## Proportion of Variance 0.02186 0.02098 0.01946 0.0158 0.01158 0.01106 0.01037
## Cumulative Proportion 0.84839 0.86937 0.88882 0.9046 0.91620 0.92726 0.93764
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.67703 0.64554 0.6067 0.54605 0.51879 0.51573 0.47762
## Proportion of Variance 0.00996 0.00906 0.0080 0.00648 0.00585 0.00578 0.00496
## Cumulative Proportion 0.94760 0.95666 0.9647 0.97114 0.97700 0.98278 0.98774
## PC29 PC30 PC31 PC32 PC33 PC34
## Standard deviation 0.4551 0.39657 0.35601 0.27021 7.748e-13 5.841e-14
## Proportion of Variance 0.0045 0.00342 0.00276 0.00159 0.000e+00 0.000e+00
## Cumulative Proportion 0.9922 0.99566 0.99841 1.00000 1.000e+00 1.000e+00
## PC35 PC36 PC37 PC38 PC39
## Standard deviation 4.501e-14 2.911e-14 2.163e-14 1.675e-14 1.407e-14
## Proportion of Variance 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
## PC40 PC41 PC42 PC43 PC44 PC45
## Standard deviation 1.398e-14 1.171e-14 7.92e-15 6.24e-15 5.778e-15 4.89e-15
## Proportion of Variance 0.000e+00 0.000e+00 0.00e+00 0.00e+00 0.000e+00 0.00e+00
## Cumulative Proportion 1.000e+00 1.000e+00 1.00e+00 1.00e+00 1.000e+00 1.00e+00
## PC46
## Standard deviation 4.499e-15
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
When PCA was applied to numeric-only variables, the first 12 components explained over 90% of the variance, indicating strong inter-variable correlation and compact data structure. However, after including one-hot encoded categorical variables, the number of components required to achieve the same level of variance increased to 20. This suggests that categorical variables contribute additional but more dispersed variance, requiring more dimensions to capture the overall structure.
Lorem Ipsum
# Create copy of dataset for the purpose of classification only, convert to factor
data_clsf <- data %>%
mutate(across(-c(ID, Age, Flight.Distance, Departure.Delay, Arrival.Delay,
Departure.Delay.Duration,
Arrival.Delay.Duration, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z, Flight.Distance.log,
Flight.Distance.log.z, Age.log, Age.log.z), as.factor))
# Encode Target Variable
data_clsf$Satisfaction <- ifelse(data_clsf$Satisfaction=="Satisfied",1,0)
data_clsf$Satisfaction <- factor(data_clsf$Satisfaction, levels = c(0, 1))
# For experiment, create subset of untransformed data, data with log, and data with normalization
data_clsf_ori <- data_clsf %>%
select(-c(ID, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z, Flight.Distance.log,
Flight.Distance.log.z, Age.log, Age.log.z))
data_clsf_log <- data_clsf %>%
select(-c(ID, Arrival.Delay.Duration.log.z, Departure.Delay.Duration.log.z, Flight.Distance.log.z,
Departure.Delay.Duration, Arrival.Delay.Duration, Age.log.z, Arrival.Delay.Duration,
Departure.Delay.Duration, Flight.Distance, Age))
data_clsf_norm <- data_clsf %>%
select(-c(ID, Arrival.Delay.Duration.log,
Departure.Delay.Duration.log, Flight.Distance.log,
Age.log, Arrival.Delay.Duration,
Departure.Delay.Duration, Flight.Distance, Age))
Lorem Ipsum
Start with Untransformed Dataset
Split Dataset Training Testing
set.seed(46748717)
library (rsample)
## Warning: package 'rsample' was built under R version 4.4.3
proportion <- 0.7
split <- initial_split(data_clsf_ori, prop = proportion)
training <- training(split)
testing <- testing(split)
Lorem Ipsum
model <- C50::C5.0(Satisfaction ~.,
data = training)
summary(model)
##
## Call:
## C5.0.formula(formula = Satisfaction ~ ., data = training)
##
##
## C5.0 [Release 2.07 GPL Edition] Fri Apr 4 11:52:55 2025
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 90916 cases (34 attributes) from undefined.data
##
## Decision tree:
##
## In.flight.Wifi.Service in {0,5}:
## :...Cleanliness = 0: 0 (4)
## : Cleanliness in {1,2,3,4,5}:
## : :...Ease.of.Online.Booking in {0,5}:
## : :...Flight.Entertainment in {0,1,2,3,4,5,7,8,9}: 1 (6785)
## : : Flight.Entertainment in {6,10}:
## : : :...In.flight.Service in {0,1,2,4,5}: 1 (2966/7)
## : : In.flight.Service = 3:
## : : :...Online.Boarding in {0,1,2,3,5}: 1 (192)
## : : Online.Boarding = 4:
## : : :...Cleanliness in {1,2,3,4}: 1 (12)
## : : Cleanliness = 5: 0 (11/2)
## : Ease.of.Online.Booking in {1,2,3,4}:
## : :...Online.Boarding in {0,1,5}: 1 (1821/3)
## : Online.Boarding in {2,3,4}:
## : :...In.flight.Entertainment in {0,1,2,3,4}: 1 (448/3)
## : In.flight.Entertainment = 5:
## : :...Leg.Room.Service = 0: 1 (0)
## : Leg.Room.Service = 5:
## : :...On.board.Service in {0,1,3,4,5}: 1 (341)
## : : On.board.Service = 2:
## : : :...Age <= 35: 0 (3)
## : : Age > 35: 1 (3)
## : Leg.Room.Service in {1,2,3,4}:
## : :...Customer.Type = First-time: 1 (15)
## : Customer.Type = Returning:
## : :...Type.of.Travel = Personal: 1 (15)
## : Type.of.Travel = Business:
## : :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5: 1 (6)
## : Check.in.Service in {1,2,3,4}: [S1]
## In.flight.Wifi.Service in {1,2,3,4}:
## :...Online.Boarding in {0,1,2,3}:
## :...In.flight.Wifi.Service = 4:
## : :...Gate.Location = 0: 0 (0)
## : : Gate.Location in {1,2,3,5}:
## : : :...Class = Business:
## : : : :...Customer.Type = Returning: 0 (475/12)
## : : : : Customer.Type = First-time:
## : : : : :...Age <= 24: 1 (28/2)
## : : : : Age > 24:
## : : : : :...In.flight.service in {1,2,3,4,5,6,7,8,12,
## : : : : : 15}: 1 (11)
## : : : : In.flight.service in {9,10,11,13,14}:
## : : : : :...Pre.flight.service in {6,8,9}: 1 (19/6)
## : : : : Pre.flight.service in {1,2,3,4,5,7,12,13,14,
## : : : : : 15}: 0 (1)
## : : : : Pre.flight.service = 10:
## : : : : :...Departure.Delay <= 5: 0 (16/4)
## : : : : : Departure.Delay > 5: 1 (2)
## : : : : Pre.flight.service = 11:
## : : : : :...In.flight.Service = 4: 1 (7/2)
## : : : : In.flight.Service in {0,1,2,3,
## : : : : 5}: 0 (6)
## : : : Class in {Economy,Economy Plus}:
## : : : :...Type.of.Travel = Personal: 0 (500/143)
## : : : Type.of.Travel = Business:
## : : : :...Online.Boarding = 0: 1 (0)
## : : : Online.Boarding in {1,2}:
## : : : :...In.flight.Service = 0: 1 (0)
## : : : : In.flight.Service in {1,2}: 0 (7)
## : : : : In.flight.Service in {3,4,5}:
## : : : : :...Flight.Distance <= 1487: 1 (215/31)
## : : : : Flight.Distance > 1487: 0 (13/3)
## : : : Online.Boarding = 3:
## : : : :...Check.in.Service = 0: 0 (0)
## : : : Check.in.Service = 5: 1 (32/7)
## : : : Check.in.Service in {1,2,3,4}:
## : : : :...Seat.Comfort = 0: 0 (0)
## : : : Seat.Comfort in {1,5}:
## : : : :...Gender = Male: 0 (10/1)
## : : : : Gender = Female:
## : : : : :...Leg.Room.Service = 0: 1 (0)
## : : : : Leg.Room.Service in {3,5}: 0 (5)
## : : : : Leg.Room.Service in {1,2,4}:
## : : : : :...Baggage.Handling in {1,2,4,
## : : : : : 5}: 1 (39/2)
## : : : : Baggage.Handling = 3: 0 (5/1)
## : : : Seat.Comfort in {2,3,4}:
## : : : :...Cleanliness = 0: 0 (0)
## : : : Cleanliness = 5:
## : : : :...Customer.Type = First-time: 0 (2)
## : : : : Customer.Type = Returning: 1 (12)
## : : : Cleanliness in {1,2,3,4}:
## : : : :...Baggage.Handling in {2,3,
## : : : : 4}: 0 (193/42)
## : : : Baggage.Handling = 1:
## : : : :...Comfortability in {3,4,5,6,7,8,10,
## : : : : : 12,14,
## : : : : : 15}: 0 (5)
## : : : : Comfortability in {9,11,13}: 1 (5)
## : : : Baggage.Handling = 5:
## : : : :...Gender = Female: 0 (11/3)
## : : : Gender = Male: 1 (5)
## : : Gate.Location = 4:
## : : :...Type.of.Travel = Personal: 0 (226/52)
## : : Type.of.Travel = Business:
## : : :...Customer.Type = First-time:
## : : :...Class in {Economy,Economy Plus}:
## : : : :...Baggage.Handling in {1,5}: 1 (9/3)
## : : : : Baggage.Handling in {2,3,4}: 0 (31/2)
## : : : Class = Business:
## : : : :...arrival.delay.status = Delayed: 0 (2)
## : : : arrival.delay.status = On Time:
## : : : :...Baggage.Handling = 3: 0 (1)
## : : : Baggage.Handling in {1,2,5}: 1 (15)
## : : : Baggage.Handling = 4:
## : : : :...In.flight.Service = 3: 0 (4/1)
## : : : In.flight.Service in {0,1,2,4,
## : : : 5}: 1 (13)
## : : Customer.Type = Returning:
## : : :...In.flight.Entertainment = 0: 1 (0)
## : : In.flight.Entertainment in {2,3,5}:
## : : :...Class = Economy Plus: 1 (3/1)
## : : : Class = Economy:
## : : : :...Cleanliness in {0,1,2,3,5}: 1 (6)
## : : : : Cleanliness = 4: 0 (3)
## : : : Class = Business:
## : : : :...Baggage.Handling in {1,2,3,5}: 1 (373/1)
## : : : Baggage.Handling = 4:
## : : : :...In.flight.Service in {0,1,4,5}: 1 (62)
## : : : In.flight.Service in {2,3}:
## : : : :...Food.and.Drink in {0,1,4,
## : : : : 5}: 0 (6)
## : : : Food.and.Drink in {2,3}: 1 (5/2)
## : : In.flight.Entertainment in {1,4}:
## : : :...Seat.Comfort in {0,1,5}: 1 (140/1)
## : : Seat.Comfort in {2,3,4}:
## : : :...Leg.Room.Service = 0: 1 (0)
## : : Leg.Room.Service in {1,2,3}: 0 (34/6)
## : : Leg.Room.Service in {4,5}:
## : : :...Cleanliness in {0,5}: 1 (43)
## : : Cleanliness in {1,2,3,4}:
## : : :...Check.in.Service in {0,
## : : : 5}: 1 (30)
## : : Check.in.Service in {1,2,3,4}:
## : : :...Baggage.Handling in {1,2,
## : : : 5}: 1 (14)
## : : Baggage.Handling = 3: [S2]
## : : Baggage.Handling = 4: [S3]
## : In.flight.Wifi.Service in {1,2,3}:
## : :...Class = Business:
## : :...In.flight.Entertainment in {4,5}:
## : : :...Customer.Type = First-time: 0 (1417/51)
## : : : Customer.Type = Returning:
## : : : :...Type.of.Travel = Personal: 0 (338)
## : : : Type.of.Travel = Business:
## : : : :...Gate.Location = 0: 1 (0)
## : : : Gate.Location in {4,5}: 0 (59)
## : : : Gate.Location in {1,2,3}: [S4]
## : : In.flight.Entertainment in {0,1,2,3}:
## : : :...Cleanliness = 5:
## : : :...Type.of.Travel = Personal: 0 (16)
## : : : Type.of.Travel = Business:
## : : : :...Customer.Type = First-time: 0 (3)
## : : : Customer.Type = Returning: 1 (46/2)
## : : Cleanliness in {0,1,2,3,4}:
## : : :...Gate.Location in {0,4,5}: 0 (2747/12)
## : : Gate.Location in {1,2,3}:
## : : :...Flight.Entertainment in {0,1,7,8,9,
## : : : 10}: 0 (0)
## : : Flight.Entertainment in {2,4,6}:
## : : :...Check.in.Service = 0: 0 (0)
## : : : Check.in.Service = 5:
## : : : :...Customer.Type = First-time: 0 (167/4)
## : : : : Customer.Type = Returning:
## : : : : :...Type.of.Travel = Business: 1 (68/1)
## : : : : Type.of.Travel = Personal: 0 (24)
## : : : Check.in.Service in {1,2,3,4}:
## : : : :...Seat.Comfort = 0: 0 (0)
## : : : Seat.Comfort = 5:
## : : : :...Customer.Type = First-time: 0 (33/1)
## : : : : Customer.Type = Returning:
## : : : : :...Type.of.Travel = Business: 1 (19/1)
## : : : : Type.of.Travel = Personal: 0 (7)
## : : : Seat.Comfort in {1,2,3,4}:
## : : : :...In.flight.Service = 5:
## : : : :...Customer.Type = First-time: 0 (193/3)
## : : : : Customer.Type = Returning: [S5]
## : : : In.flight.Service in {0,1,2,3,4}:
## : : : :...Baggage.Handling = 5: [S6]
## : : : Baggage.Handling in {1,2,3,4}: [S7]
## : : Flight.Entertainment in {3,5}:
## : : :...Customer.Type = First-time: 0 (612/18)
## : : Customer.Type = Returning:
## : : :...Type.of.Travel = Personal: 0 (175)
## : : Type.of.Travel = Business:
## : : :...Flight.Distance <= 291: 0 (48/1)
## : : Flight.Distance > 291:
## : : :...In.flight.Entertainment in {0,1}: [S8]
## : : In.flight.Entertainment in {2,3}:
## : : :...Cleanliness = 0: 1 (0)
## : : Cleanliness = 1: [S9]
## : : Cleanliness in {2,3,4}: [S10]
## : Class in {Economy,Economy Plus}:
## : :...Type.of.Travel = Personal: 0 (16721)
## : Type.of.Travel = Business:
## : :...Customer.Type = First-time: 0 (7240/40)
## : Customer.Type = Returning:
## : :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5: 1 (103/1)
## : Check.in.Service in {1,2,3,4}:
## : :...Baggage.Handling = 5:
## : :...In.flight.Wifi.Service = 1: 0 (2)
## : : In.flight.Wifi.Service in {2,3}: 1 (49)
## : Baggage.Handling in {1,2,3,4}:
## : :...In.flight.Service = 0: 0 (0)
## : In.flight.Service = 5: 1 (35/1)
## : In.flight.Service in {1,2,3,4}:
## : :...Seat.Comfort = 0: 0 (0)
## : Seat.Comfort = 5:
## : :...In.flight.Wifi.Service = 1: 0 (12)
## : : In.flight.Wifi.Service in {2,
## : : 3}: 1 (22/1)
## : Seat.Comfort in {1,2,3,4}:
## : :...Cleanliness = 0: 0 (0)
## : Cleanliness = 5: 1 (14)
## : Cleanliness in {1,2,3,4}:
## : :...On.board.Service = 0: 0 (0)
## : On.board.Service = 5: [S11]
## : On.board.Service in {1,2,3,4}:
## : :...Age <= 33: 0 (1150)
## : Age > 33: [S12]
## Online.Boarding in {4,5}:
## :...Type.of.Travel = Personal: 0 (8120/932)
## Type.of.Travel = Business:
## :...Comfortability in {11,12,13,14,15}:
## :...Customer.Type = Returning:
## : :...Class in {Economy,Economy Plus}:
## : : :...Baggage.Handling in {1,2,5}:
## : : : :...Baggage.Handling in {1,5}: 1 (400/3)
## : : : : Baggage.Handling = 2:
## : : : : :...Flight.Entertainment in {0,1,2,3,
## : : : : : 10}: 1 (0)
## : : : : Flight.Entertainment in {5,7}:
## : : : : :...Online.Boarding = 4: 0 (9/2)
## : : : : : Online.Boarding = 5: 1 (1)
## : : : : Flight.Entertainment in {4,6,8,9}:
## : : : : :...In.flight.Service in {0,4,5}: 1 (86)
## : : : : In.flight.Service in {1,2,3}:
## : : : : :...Gender = Female: 1 (46/2)
## : : : : Gender = Male: [S13]
## : : : Baggage.Handling in {3,4}:
## : : : :...Check.in.Service in {0,5}: 1 (117)
## : : : Check.in.Service in {1,2,3,4}:
## : : : :...Online.Boarding = 5: 1 (65)
## : : : Online.Boarding = 4:
## : : : :...In.flight.Service = 0: 1 (0)
## : : : In.flight.Service in {1,2,5}: [S14]
## : : : In.flight.Service in {3,4}:
## : : : :...On.board.Service = 0: 0 (0)
## : : : On.board.Service = 5: [S15]
## : : : On.board.Service in {1,2,3,4}:
## : : : :...Seat.Comfort in {0,
## : : : : 1}: 0 (0)
## : : : Seat.Comfort in {2,3,4}:
## : : : :...Cleanliness in {0,1,2,3,
## : : : : : 4}: 0 (374/108)
## : : : : Cleanliness = 5: 1 (10)
## : : : Seat.Comfort = 5: [S16]
## : : Class = Business:
## : : :...Check.in.Service = 0: 1 (0)
## : : Check.in.Service in {3,4,5}:
## : : :...Leg.Room.Service = 0: 1 (0)
## : : : Leg.Room.Service in {1,2,4,5}:
## : : : :...In.flight.Wifi.Service in {1,2,
## : : : : : 3}: 1 (10932/12)
## : : : : In.flight.Wifi.Service = 4:
## : : : : :...Gate.Location = 4: 1 (3686/19)
## : : : : Gate.Location in {0,5}: 0 (23/1)
## : : : : Gate.Location in {1,2,3}:
## : : : : :...Online.Boarding = 4: 0 (58)
## : : : : Online.Boarding = 5: 1 (1)
## : : : Leg.Room.Service = 3:
## : : : :...Food.and.Drink = 0: 1 (0)
## : : : Food.and.Drink = 1: 0 (5)
## : : : Food.and.Drink in {2,3,4,5}:
## : : : :...In.flight.Wifi.Service in {1,
## : : : : 2}: 1 (661/2)
## : : : In.flight.Wifi.Service in {3,4}:
## : : : :...Gate.Location in {0,1,2,
## : : : : 5}: 0 (53)
## : : : Gate.Location in {3,4}: 1 (654/24)
## : : Check.in.Service in {1,2}:
## : : :...In.flight.Wifi.Service in {1,2}:
## : : :...Gate.Location in {1,2}: 1 (333/1)
## : : : Gate.Location in {0,3,4,5}: 0 (4)
## : : In.flight.Wifi.Service in {3,4}:
## : : :...Gate.Location in {0,1,2,5}: 0 (77)
## : : Gate.Location in {3,4}:
## : : :...Online.Boarding = 5: 1 (176)
## : : Online.Boarding = 4:
## : : :...In.flight.Service in {0,1,2,
## : : : 5}: 1 (71/2)
## : : In.flight.Service in {3,4}:
## : : :...Seat.Comfort in {0,1}: 0 (0)
## : : Seat.Comfort in {2,5}: 1 (17)
## : : Seat.Comfort in {3,4}: [S17]
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,2,3}: 0 (227/4)
## : In.flight.Wifi.Service = 4:
## : :...Check.in.Service = 0: 1 (0)
## : Check.in.Service in {1,2}:
## : :...In.flight.Service = 0: 0 (0)
## : : In.flight.Service in {1,2,5}:
## : : :...Age_Range = Adult: 0 (41/10)
## : : : Age_Range = Children: 1 (1)
## : : : Age_Range = Senior:
## : : : :...In.flight.Service in {1,2}: 0 (3)
## : : : : In.flight.Service = 5: 1 (2)
## : : : Age_Range = Youth:
## : : : :...Baggage.Handling in {1,2,3}: 0 (14/6)
## : : : Baggage.Handling in {4,5}: 1 (11)
## : : In.flight.Service in {3,4}:
## : : :...Gate.Location in {0,3,4}: 0 (129/5)
## : : Gate.Location in {1,2,5}:
## : : :...Leg.Room.Service in {1,3}: 1 (5)
## : : Leg.Room.Service in {0,2,4,5}: 0 (20/4)
## : Check.in.Service in {3,4,5}:
## : :...In.flight.Service = 0: 1 (0)
## : In.flight.Service in {2,3}:
## : :...Gate.Location = 0: 0 (0)
## : : Gate.Location in {1,2}:
## : : :...arrival.delay.status = Delayed: 0 (3)
## : : : arrival.delay.status = On Time: 1 (39/13)
## : : Gate.Location in {3,4,5}: [S18]
## : In.flight.Service in {1,4,5}:
## : :...On.board.Service = 0: 1 (0)
## : On.board.Service in {1,2}:
## : :...In.flight.Service = 4: 0 (38/3)
## : : In.flight.Service in {1,5}:
## : : :...Comfortability in {11,13}: 1 (19/3)
## : : Comfortability in {14,15}: 0 (6/2)
## : : Comfortability = 12:
## : : :...Cleanliness in {0,1,2,3,4}: 0 (5)
## : : Cleanliness = 5: [S19]
## : On.board.Service in {3,4,5}:
## : :...Age_Range in {Children,Youth}:
## : :...Class = Economy Plus: 0 (1)
## : : Class = Business:
## : : :...Age <= 24: 1 (142/2)
## : : : Age > 24: [S20]
## : : Class = Economy: [S21]
## : Age_Range in {Adult,Senior}:
## : :...Class = Economy: 0 (26/3)
## : Class in {Business,Economy Plus}:
## : :...Leg.Room.Service = 1: 0 (2)
## : Leg.Room.Service in {0,3,
## : : 4}: 1 (145/54)
## : Leg.Room.Service = 2:
## : :...Gate.Location = 0: 1 (0)
## : : Gate.Location = 5: 0 (2)
## : : Gate.Location = 1: [S22]
## : : Gate.Location = 2:
## : : :...Gender = Female: 0 (5/1)
## : : : Gender = Male: 1 (3)
## : : Gate.Location = 3:
## : : :...Arrival.Delay <= 3: 1 (12/2)
## : : : Arrival.Delay > 3: 0 (2)
## : : Gate.Location = 4: [S23]
## : Leg.Room.Service = 5: [S24]
## Comfortability in {3,4,5,6,7,8,9,10}:
## :...Online.Boarding = 5:
## :...Customer.Type = Returning: 1 (721/3)
## : Customer.Type = First-time:
## : :...Class in {Economy,Economy Plus}:
## : :...Gate.Location in {0,2,3,4}: 0 (58/7)
## : : Gate.Location in {1,5}:
## : : :...Departure.Delay > 1: 0 (3)
## : : Departure.Delay <= 1:
## : : :...Ease.of.Online.Booking = 1: 0 (1)
## : : Ease.of.Online.Booking in {0,2,3,4,
## : : 5}: 1 (9)
## : Class = Business:
## : :...In.flight.Wifi.Service = 3: 1 (0)
## : In.flight.Wifi.Service = 1: 0 (3)
## : In.flight.Wifi.Service in {2,4}:
## : :...Age_Range in {Children,Senior,
## : : Youth}: 1 (29)
## : Age_Range = Adult:
## : :...Leg.Room.Service = 3: 0 (9/2)
## : Leg.Room.Service in {0,1,5}: 1 (4)
## : Leg.Room.Service = 2:
## : :...Age <= 36: 0 (2)
## : : Age > 36: 1 (4)
## : Leg.Room.Service = 4:
## : :...Gender = Female: 0 (2)
## : Gender = Male: 1 (3)
## Online.Boarding = 4:
## :...Flight.Entertainment in {0,1,10}: 0 (0)
## Flight.Entertainment in {2,4}:
## :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5:
## : :...Customer.Type = First-time: 0 (4)
## : : Customer.Type = Returning: 1 (36)
## : Check.in.Service in {1,2,3,4}:
## : :...Seat.Comfort = 0: 0 (0)
## : Seat.Comfort in {2,5}:
## : :...Age_Range in {Adult,Senior,Youth}: 1 (22/1)
## : : Age_Range = Children: 0 (2)
## : Seat.Comfort in {1,3,4}:
## : :...In.flight.service in {1,2,14,
## : : 15}: 0 (0)
## : In.flight.service in {11,12,13}:
## : :...Cleanliness in {1,2,4}: 0 (9/1)
## : : Cleanliness in {0,3,5}: 1 (19)
## : In.flight.service in {3,4,5,6,7,8,9,10}:
## : :...In.flight.Entertainment in {4,
## : : 5}: 0 (0)
## : In.flight.Entertainment = 3:
## : :...Gate.Location in {0,2,3,4,5}: 0 (22)
## : : Gate.Location = 1:
## : : :...Departure.Delay <= 3: 1 (12)
## : : Departure.Delay > 3: 0 (2)
## : In.flight.Entertainment in {0,1,2}:
## : :...Comfortability in {3,4,5,6,7,8,9}:
## : :...Gate.Location = 0: 1 (1)
## : : Gate.Location in {1,3,4,
## : : : 5}: 0 (850/4)
## : : Gate.Location = 2:
## : : :...Cleanliness in {0,1,2,4,
## : : : 5}: 0 (180/2)
## : : Cleanliness = 3: [S25]
## : Comfortability = 10:
## : :...Gate.Location = 0: 0 (0)
## : Gate.Location in {1,3,4,5}: [S26]
## : Gate.Location = 2: [S27]
## Flight.Entertainment in {3,5,6,7,8,9}:
## :...In.flight.Service = 0: 1 (0)
## In.flight.Service = 3:
## :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5:
## : :...In.flight.Entertainment in {0,3,4,
## : : : 5}: 1 (70)
## : : In.flight.Entertainment in {1,2}:
## : : :...Class = Business: 1 (2)
## : : Class in {Economy,Economy Plus}: 0 (8/1)
## : Check.in.Service in {1,2,3,4}:
## : :...Baggage.Handling in {1,5}:
## : :...Age_Range = Senior: 0 (4)
## : : Age_Range in {Adult,Children,Youth}:
## : : :...Customer.Type = Returning: 1 (57)
## : : Customer.Type = First-time:
## : : :...Departure.Delay <= 1: 1 (20/5)
## : : Departure.Delay > 1: 0 (5)
## : Baggage.Handling in {2,3,4}:
## : :...In.flight.service in {1,2,3,4,13,14,
## : : 15}: 0 (0)
## : In.flight.service = 12:
## : :...Customer.Type = First-time: 0 (4)
## : : Customer.Type = Returning: [S28]
## : In.flight.service in {5,6,7,8,9,10,11}:
## : :...In.flight.Wifi.Service = 3:
## : :...Comfortability in {3,
## : : : 4}: 0 (0)
## : : Comfortability = 5: 1 (2)
## : : Comfortability in {6,7,8,9,10}: [S29]
## : In.flight.Wifi.Service in {1,2,4}:
## : :...Customer.Type = First-time:
## : :...Gate.Location in {0,3,
## : : : 4}: 0 (208/16)
## : : Gate.Location in {1,2,5}: [S30]
## : Customer.Type = Returning:
## : :...Age_Range in {Children,Senior,
## : : Youth}: [S31]
## : Age_Range = Adult: [S32]
## In.flight.Service in {1,2,4,5}:
## :...Check.in.Service = 5:
## :...Customer.Type = Returning: 1 (354)
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,
## : : 2}: 1 (0)
## : In.flight.Wifi.Service = 3: 0 (14)
## : In.flight.Wifi.Service = 4:
## : :...Leg.Room.Service = 0: 1 (0)
## : Leg.Room.Service = 1: 0 (14/3)
## : Leg.Room.Service in {2,3,4,5}:
## : :...Age <= 30: 1 (230/47)
## : Age > 30:
## : :...Age <= 36: 0 (23)
## : Age > 36: [S33]
## Check.in.Service in {0,1,2,3,4}:
## :...In.flight.Service = 5:
## :...Customer.Type = Returning:
## : :...Ease.of.Online.Booking = 0: 0 (6)
## : : Ease.of.Online.Booking in {1,2,3,4,5}:
## : : :...Leg.Room.Service in {0,1,2,3,
## : : : 5}: 1 (327)
## : : Leg.Room.Service = 4: [S34]
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,
## : : 2}: 1 (0)
## : In.flight.Wifi.Service = 3: 0 (12/1)
## : In.flight.Wifi.Service = 4:
## : :...Age <= 24: 1 (147/27)
## : Age > 24: [S35]
## In.flight.Service in {1,2,4}:
## :...Baggage.Handling = 5:
## :...Customer.Type = Returning: 1 (141)
## : Customer.Type = First-time: [S36]
## Baggage.Handling in {1,2,3,4}:
## :...On.board.Service = 0: 0 (0)
## On.board.Service = 5:
## :...Customer.Type = Returning: [S37]
## : Customer.Type = First-time: [S38]
## On.board.Service in {1,2,3,4}:
## :...Customer.Type = First-time:
## :...Class = Business: [S39]
## : Class in {Economy,Economy Plus}: [S40]
## Customer.Type = Returning:
## :...Seat.Comfort in {1,2,5}:
## :...Age > 31: 1 (152/1)
## : Age <= 31: [S41]
## Seat.Comfort in {0,3,4}: [S42]
##
## SubTree [S1]
##
## Departure.and.Arrival.Time.Convenience = 0: 0 (0)
## Departure.and.Arrival.Time.Convenience = 5: 1 (14/3)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}:
## :...Class = Business:
## :...In.flight.Wifi.Service = 0: 1 (1)
## : In.flight.Wifi.Service = 5: 0 (51)
## Class in {Economy,Economy Plus}:
## :...Baggage.Handling in {1,2,5}: 1 (22/1)
## Baggage.Handling in {3,4}: 0 (24/5)
##
## SubTree [S2]
##
## Class in {Business,Economy}: 1 (10)
## Class = Economy Plus: 0 (2)
##
## SubTree [S3]
##
## Online.Boarding in {0,1}: 1 (38/5)
## Online.Boarding in {2,3}:
## :...In.flight.service in {1,2,3,4,5,6,7,8,11,15}: 0 (29/11)
## In.flight.service = 14: 1 (2)
## In.flight.service = 9:
## :...Online.Boarding = 2: 1 (5)
## : Online.Boarding = 3: 0 (14/5)
## In.flight.service = 10:
## :...Distance_Group = Medium-haul: 0 (11/3)
## : Distance_Group = Long-haul:
## : :...Gender = Female: 1 (5)
## : : Gender = Male: 0 (3/1)
## : Distance_Group = Short-haul:
## : :...Age_Range in {Adult,Children,Youth}: 1 (15/3)
## : Age_Range = Senior: 0 (2)
## In.flight.service = 12:
## :...Cleanliness in {2,4}: 0 (27/10)
## : Cleanliness = 3:
## : :...Class in {Business,Economy Plus}: 1 (7)
## : : Class = Economy: 0 (2)
## : Cleanliness = 1:
## : :...Online.Boarding = 3: 0 (6)
## : Online.Boarding = 2:
## : :...Seat.Comfort = 2: 0 (1)
## : Seat.Comfort in {3,4}: 1 (3)
## In.flight.service = 13:
## :...arrival.delay.status = Delayed: 0 (4)
## arrival.delay.status = On Time:
## :...Seat.Comfort in {2,4}: 1 (12/2)
## Seat.Comfort = 3:
## :...Comfortability in {3,4,5,6,7,8,10,12,13,14,15}: 0 (8/1)
## Comfortability in {9,11}: 1 (6/1)
##
## SubTree [S4]
##
## Departure.and.Arrival.Time.Convenience = 0: 0 (25)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4,5}:
## :...Leg.Room.Service = 0: 1 (0)
## Leg.Room.Service in {1,2,3}:
## :...Baggage.Handling in {1,2,3}: 0 (42/3)
## : Baggage.Handling in {4,5}:
## : :...Flight.Distance > 2844: 0 (4)
## : Flight.Distance <= 2844:
## : :...On.board.Service in {0,1,3,4}: 1 (20)
## : On.board.Service in {2,5}:
## : :...Departure.and.Arrival.Time.Convenience = 2: 1 (2)
## : Departure.and.Arrival.Time.Convenience in {1,3,4,5}: 0 (3)
## Leg.Room.Service in {4,5}:
## :...Age_Range in {Children,Youth}:
## :...Baggage.Handling in {1,2,3,4}: 0 (7)
## : Baggage.Handling = 5: 1 (2)
## Age_Range in {Adult,Senior}:
## :...Baggage.Handling in {4,5}:
## :...In.flight.Service in {1,2}: 0 (5/1)
## : In.flight.Service in {0,3,4,5}: 1 (1446/1)
## Baggage.Handling in {1,2}:
## :...Leg.Room.Service = 5: 1 (3)
## : Leg.Room.Service = 4:
## : :...In.flight.Service in {0,1,2,5}: 0 (0)
## : In.flight.Service = 3: 1 (1)
## : In.flight.Service = 4:
## : :...On.board.Service = 2: 1 (1)
## : On.board.Service in {0,1,3,4,5}: 0 (22)
## Baggage.Handling = 3:
## :...In.flight.Service in {0,1}: 1 (0)
## In.flight.Service in {2,3}: 0 (5)
## In.flight.Service in {4,5}:
## :...In.flight.Wifi.Service in {1,2}: 1 (37)
## In.flight.Wifi.Service = 3:
## :...Gate.Location in {1,2}: 0 (5)
## Gate.Location = 3: 1 (20)
##
## SubTree [S5]
##
## Type.of.Travel = Business: 1 (53)
## Type.of.Travel = Personal: 0 (21)
##
## SubTree [S6]
##
## Customer.Type = First-time: 0 (79/1)
## Customer.Type = Returning:
## :...Type.of.Travel = Business: 1 (30/1)
## Type.of.Travel = Personal: 0 (21)
##
## SubTree [S7]
##
## On.board.Service in {0,1,2}: 0 (1647/20)
## On.board.Service = 5:
## :...Customer.Type = First-time: 0 (38/1)
## : Customer.Type = Returning:
## : :...In.flight.Service in {0,1,2}: 0 (14)
## : In.flight.Service in {3,4}:
## : :...Distance_Group = Short-haul: 0 (5)
## : Distance_Group in {Long-haul,Medium-haul}:
## : :...Comfortability in {3,4,6,7,8,9,11,12,13,14,15}: 1 (14)
## : Comfortability in {5,10}: 0 (3)
## On.board.Service in {3,4}:
## :...In.flight.Wifi.Service in {2,3}: 0 (1101/31)
## In.flight.Wifi.Service = 1:
## :...Gate.Location in {2,3}: 0 (167)
## Gate.Location = 1:
## :...In.flight.Entertainment in {0,1,2}: 0 (56/1)
## In.flight.Entertainment = 3:
## :...Customer.Type = First-time: 0 (3)
## Customer.Type = Returning:
## :...Baggage.Handling in {2,3,4}: 1 (26/3)
## Baggage.Handling = 1:
## :...Age <= 32: 1 (3)
## Age > 32: 0 (9)
##
## SubTree [S8]
##
## Baggage.Handling = 5: 0 (0)
## Baggage.Handling = 1: 1 (4)
## Baggage.Handling in {2,3,4}:
## :...Comfortability = 4: 1 (2)
## Comfortability in {3,5,6,7,8,9,10,11,12,13,14,15}: 0 (42)
##
## SubTree [S9]
##
## Baggage.Handling in {1,4,5}: 0 (15)
## Baggage.Handling in {2,3}:
## :...Food.and.Drink in {1,2,3}: 0 (16/5)
## Food.and.Drink in {0,4,5}: 1 (9/2)
##
## SubTree [S10]
##
## In.flight.Service in {0,5}: 1 (67)
## In.flight.Service in {1,2,3,4}:
## :...Check.in.Service in {0,5}: 1 (49)
## Check.in.Service in {1,2,3,4}:
## :...Baggage.Handling = 5: 1 (25)
## Baggage.Handling in {1,2,3,4}:
## :...Seat.Comfort in {0,5}: 1 (15)
## Seat.Comfort in {1,2,3,4}:
## :...Leg.Room.Service = 5: 1 (11)
## Leg.Room.Service in {0,1,2,3,4}:
## :...Online.Boarding = 0: 1 (0)
## Online.Boarding = 1:
## :...Baggage.Handling = 1: 0 (2)
## : Baggage.Handling in {2,3,4}: 1 (11)
## Online.Boarding = 2:
## :...In.flight.Entertainment = 3:
## : :...Seat.Comfort in {1,2}: 0 (14/1)
## : : Seat.Comfort in {3,4}: 1 (4/1)
## : In.flight.Entertainment = 2:
## : :...Ease.of.Online.Booking = 0: 1 (0)
## : Ease.of.Online.Booking in {2,4}: 0 (6)
## : Ease.of.Online.Booking in {1,3,5}:
## : :...Seat.Comfort in {1,3}: 0 (4/1)
## : Seat.Comfort in {2,4}: 1 (29)
## Online.Boarding = 3:
## :...Distance_Group = Short-haul:
## :...Age_Range in {Adult,Children}: 0 (8/1)
## : Age_Range in {Senior,Youth}: 1 (2)
## Distance_Group = Long-haul:
## :...In.flight.Entertainment = 2: 0 (20/2)
## : In.flight.Entertainment = 3:
## : :...Baggage.Handling in {1,3}: 1 (5)
## : Baggage.Handling in {2,4}: 0 (7)
## Distance_Group = Medium-haul:
## :...Ease.of.Online.Booking in {0,5}: 0 (3)
## Ease.of.Online.Booking in {2,4}: 1 (11)
## Ease.of.Online.Booking = 1:
## :...Departure.Delay <= 1: 0 (3)
## : Departure.Delay > 1: 1 (2)
## Ease.of.Online.Booking = 3:
## :...Departure.Delay <= 1: 0 (2)
## Departure.Delay > 1: 1 (3)
##
## SubTree [S11]
##
## In.flight.Wifi.Service = 1: 0 (16)
## In.flight.Wifi.Service in {2,3}:
## :...Distance_Group = Long-haul: 0 (10)
## Distance_Group in {Medium-haul,Short-haul}:
## :...Leg.Room.Service in {0,1,2,5}: 1 (21/2)
## Leg.Room.Service in {3,4}:
## :...In.flight.Entertainment in {0,1,4,5}: 0 (12/1)
## In.flight.Entertainment = 2: 1 (11/4)
## In.flight.Entertainment = 3:
## :...Age <= 47: 1 (3)
## Age > 47: 0 (6)
##
## SubTree [S12]
##
## In.flight.Wifi.Service = 1: 0 (747)
## In.flight.Wifi.Service in {2,3}:
## :...In.flight.Entertainment = 0: 0 (0)
## In.flight.Entertainment = 5: 1 (3)
## In.flight.Entertainment in {1,2,3,4}:
## :...In.flight.Service = 1:
## :...Flight.Entertainment in {0,1,2,3,5,8,9,10}: 0 (26/1)
## : Flight.Entertainment in {4,6,7}:
## : :...Baggage.Handling in {3,4}: 1 (22/6)
## : Baggage.Handling in {1,2}:
## : :...Class = Economy: 0 (38/6)
## : Class = Economy Plus:
## : :...Comfortability in {3,4,5,6,7,9,10,12,13,14,
## : : 15}: 1 (8/1)
## : Comfortability in {8,11}: 0 (3)
## In.flight.Service in {2,3,4}:
## :...Baggage.Handling = 1:
## :...Flight.Entertainment in {0,1,2,8,9,10}: 0 (0)
## : Flight.Entertainment in {3,5,7}: 1 (6)
## : Flight.Entertainment in {4,6}:
## : :...In.flight.Service = 2: 0 (36/6)
## : In.flight.Service = 3:
## : :...Flight.Distance <= 1484: 1 (5)
## : : Flight.Distance > 1484: 0 (8)
## : In.flight.Service = 4:
## : :...Pre.flight.service in {6,8}: 0 (3)
## : Pre.flight.service in {1,2,3,4,5,7,9,10,11,12,13,14,
## : 15}: 1 (7)
## Baggage.Handling in {2,3,4}:
## :...Online.Boarding = 0: 0 (0)
## Online.Boarding = 1:
## :...Seat.Comfort in {3,4}:
## : :...Distance_Group = Medium-haul: 0 (1)
## : : Distance_Group in {Long-haul,Short-haul}: 1 (9)
## : Seat.Comfort in {1,2}:
## : :...Comfortability in {3,4,5,6,7,10,11,12,13,14,
## : : 15}: 0 (30)
## : Comfortability in {8,9}:
## : :...Food.and.Drink in {0,1,2,3,5}: 0 (12/2)
## : Food.and.Drink = 4: 1 (4)
## Online.Boarding in {2,3}:
## :...Baggage.Handling = 2:
## :...In.flight.Service in {2,3}: 0 (557/35)
## : In.flight.Service = 4:
## : :...Distance_Group in {Long-haul,
## : : Medium-haul}: 0 (13/1)
## : Distance_Group = Short-haul: 1 (10/1)
## Baggage.Handling in {3,4}:
## :...In.flight.Service in {3,4}: 0 (1766/44)
## In.flight.Service = 2:
## :...Baggage.Handling = 3: 0 (94/2)
## Baggage.Handling = 4:
## :...Age <= 50: 1 (13/2)
## Age > 50: 0 (12/1)
##
## SubTree [S13]
##
## Pre.flight.service in {6,8}: 0 (14/4)
## Pre.flight.service in {1,2,3,4,5,7,9,10,11,12,13,14,15}: 1 (83/13)
##
## SubTree [S14]
##
## In.flight.Entertainment = 0: 1 (0)
## In.flight.Entertainment in {1,2,3}:
## :...Cleanliness in {0,1,2,5}: 0 (0)
## : Cleanliness = 3: 1 (8)
## : Cleanliness = 4:
## : :...In.flight.Service in {1,2}: 0 (20/4)
## : In.flight.Service = 5: 1 (4)
## In.flight.Entertainment in {4,5}:
## :...Baggage.Handling = 4: 1 (97/1)
## Baggage.Handling = 3:
## :...Departure.Delay <= 21: 1 (76/13)
## Departure.Delay > 21: 0 (3)
##
## SubTree [S15]
##
## In.flight.Entertainment = 2: 0 (3)
## In.flight.Entertainment in {0,1,3,4,5}: 1 (28/2)
##
## SubTree [S16]
##
## In.flight.Entertainment in {0,1,2,3,4}: 1 (21)
## In.flight.Entertainment = 5: 0 (14/1)
##
## SubTree [S17]
##
## Flight.Entertainment in {0,1,2,3,4,5,6,8,9,10}: 0 (67/15)
## Flight.Entertainment = 7:
## :...Seat.Comfort = 3: 0 (1)
## Seat.Comfort = 4: 1 (12)
##
## SubTree [S18]
##
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}: 0 (88/6)
## Departure.and.Arrival.Time.Convenience in {0,5}:
## :...Comfortability in {13,14}: 0 (8)
## Comfortability in {11,12,15}:
## :...Check.in.Service = 3: 0 (7/2)
## Check.in.Service in {4,5}: 1 (8)
##
## SubTree [S19]
##
## In.flight.Service = 1: 1 (4)
## In.flight.Service = 5: 0 (1)
##
## SubTree [S20]
##
## In.flight.service in {1,2,3,4,5,6,7,8,9,10,11,14,15}: 1 (19/1)
## In.flight.service = 12:
## :...delay.status = Delayed: 1 (2)
## : delay.status = On-time: 0 (8/1)
## In.flight.service = 13:
## :...Baggage.Handling = 3: 0 (1)
## Baggage.Handling in {1,2,5}: 1 (3)
## Baggage.Handling = 4:
## :...Departure.Delay <= 1: 0 (3)
## Departure.Delay > 1: 1 (2)
##
## SubTree [S21]
##
## In.flight.Entertainment = 1: 0 (1)
## In.flight.Entertainment in {0,2,4}: 1 (62/19)
## In.flight.Entertainment = 3:
## :...Departure.and.Arrival.Time.Convenience = 2: 1 (1)
## : Departure.and.Arrival.Time.Convenience in {1,3,5}: 0 (3)
## : Departure.and.Arrival.Time.Convenience = 0:
## : :...Check.in.Service in {3,4}: 1 (4)
## : : Check.in.Service = 5: 0 (2)
## : Departure.and.Arrival.Time.Convenience = 4:
## : :...delay.status = Delayed: 0 (4)
## : delay.status = On-time:
## : :...In.flight.Service = 4: 0 (5/1)
## : In.flight.Service in {1,5}: 1 (7/1)
## In.flight.Entertainment = 5:
## :...Baggage.Handling in {1,5}: 1 (33/6)
## Baggage.Handling = 2:
## :...Seat.Comfort = 2: 0 (1)
## : Seat.Comfort in {0,1,3,4,5}: 1 (3)
## Baggage.Handling = 3:
## :...Departure.and.Arrival.Time.Convenience in {0,3}: 1 (3)
## : Departure.and.Arrival.Time.Convenience in {1,2,4,5}: 0 (7/1)
## Baggage.Handling = 4:
## :...Flight.Distance > 253: 1 (36/10)
## Flight.Distance <= 253:
## :...On.board.Service = 3: 1 (2)
## On.board.Service in {4,5}: 0 (5)
##
## SubTree [S22]
##
## Ease.of.Online.Booking in {0,1,2,3,4}: 0 (9)
## Ease.of.Online.Booking = 5: 1 (2)
##
## SubTree [S23]
##
## Distance_Group in {Long-haul,Short-haul}: 1 (13/3)
## Distance_Group = Medium-haul: 0 (3)
##
## SubTree [S24]
##
## Pre.flight.service in {10,14,15}: 0 (6/2)
## Pre.flight.service in {1,2,3,4,5,6,7,8,9,12}: 1 (42/12)
## Pre.flight.service = 11:
## :...Distance_Group = Long-haul: 0 (3)
## : Distance_Group in {Medium-haul,Short-haul}: 1 (45/10)
## Pre.flight.service = 13:
## :...Comfortability in {11,14}: 1 (19/7)
## Comfortability in {12,15}: 0 (8/1)
## Comfortability = 13:
## :...Age <= 31: 1 (4/1)
## Age > 31:
## :...In.flight.service in {9,14}: 1 (3)
## In.flight.service in {1,2,3,4,5,6,7,8,10,11,12,13,15}: 0 (7)
##
## SubTree [S25]
##
## On.board.Service in {0,1,3,4,5}: 0 (23)
## On.board.Service = 2:
## :...Check.in.Service in {1,2,4}: 0 (25/4)
## Check.in.Service = 3: 1 (7/1)
##
## SubTree [S26]
##
## In.flight.Wifi.Service in {1,2,4}: 0 (91/2)
## In.flight.Wifi.Service = 3:
## :...Customer.Type = First-time: 0 (2)
## Customer.Type = Returning: 1 (3)
##
## SubTree [S27]
##
## Age_Range in {Children,Youth}: 1 (0)
## Age_Range = Senior: 0 (4)
## Age_Range = Adult:
## :...Check.in.Service in {1,2}: 0 (5/1)
## Check.in.Service = 4: 1 (12/2)
## Check.in.Service = 3:
## :...Food.and.Drink = 2: 0 (2)
## Food.and.Drink in {0,1,3,4,5}: 1 (6)
##
## SubTree [S28]
##
## In.flight.Entertainment = 1: 0 (2)
## In.flight.Entertainment in {0,2,3,4,5}: 1 (15)
##
## SubTree [S29]
##
## In.flight.Entertainment in {0,1}: 0 (0)
## In.flight.Entertainment in {4,5}:
## :...Customer.Type = First-time: 0 (6)
## : Customer.Type = Returning: 1 (7/1)
## In.flight.Entertainment in {2,3}:
## :...Seat.Comfort in {0,1,3,4}: 0 (621/13)
## Seat.Comfort in {2,5}:
## :...On.board.Service in {0,1,2,4,5}: 0 (8)
## On.board.Service = 3: 1 (7)
##
## SubTree [S30]
##
## Age_Range = Adult: 0 (14)
## Age_Range in {Children,Senior,Youth}:
## :...Baggage.Handling in {2,3}: 0 (13/4)
## Baggage.Handling = 4: 1 (8)
##
## SubTree [S31]
##
## In.flight.Wifi.Service = 4: 0 (84/6)
## In.flight.Wifi.Service = 1:
## :...Seat.Comfort in {0,1,2,3,5}: 0 (4)
## : Seat.Comfort = 4: 1 (4)
## In.flight.Wifi.Service = 2:
## :...Baggage.Handling in {2,3}: 0 (11/1)
## Baggage.Handling = 4: 1 (3)
##
## SubTree [S32]
##
## Seat.Comfort = 0: 0 (0)
## Seat.Comfort in {2,5}: 1 (9)
## Seat.Comfort = 1:
## :...In.flight.Entertainment in {2,4}: 0 (3)
## : In.flight.Entertainment in {0,1,3,5}: 1 (5)
## Seat.Comfort = 3:
## :...Leg.Room.Service in {0,1,2,3,5}: 0 (32)
## : Leg.Room.Service = 4:
## : :...Cleanliness = 1: 1 (3)
## : Cleanliness in {0,2,3,4,5}: 0 (3)
## Seat.Comfort = 4:
## :...Distance_Group = Medium-haul:
## :...Cleanliness in {0,1,2,4,5}: 0 (37/8)
## : Cleanliness = 3: 1 (6)
## Distance_Group = Long-haul:
## :...In.flight.Wifi.Service = 1: 1 (2)
## : In.flight.Wifi.Service = 4: 0 (19)
## : In.flight.Wifi.Service = 2:
## : :...Baggage.Handling = 2: 0 (6)
## : Baggage.Handling in {3,4}: 1 (3)
## Distance_Group = Short-haul:
## :...Flight.Entertainment in {3,5}: 0 (9/3)
## Flight.Entertainment in {6,9}: 1 (2)
## Flight.Entertainment = 7:
## :...Gate.Location = 1: 0 (1)
## : Gate.Location in {0,2,3,4,5}: 1 (7)
## Flight.Entertainment = 8:
## :...Pre.flight.service in {6,12,13}: 1 (14/3)
## Pre.flight.service in {1,2,3,4,5,9,10,14,15}: 0 (35/12)
## Pre.flight.service = 7:
## :...Class in {Business,Economy}: 0 (8/1)
## : Class = Economy Plus: 1 (2)
## Pre.flight.service = 8:
## :...Gender = Female: 0 (5)
## : Gender = Male: 1 (6/1)
## Pre.flight.service = 11:
## :...Baggage.Handling = 2: 1 (2)
## Baggage.Handling = 4: 0 (9/2)
## Baggage.Handling = 3:
## :...Class = Economy: 0 (1)
## Class in {Business,Economy Plus}: 1 (3)
##
## SubTree [S33]
##
## Class = Business: 1 (81/23)
## Class in {Economy,Economy Plus}: 0 (20/5)
##
## SubTree [S34]
##
## In.flight.service in {1,2,3,4,5,6,7,8,9,10,11,12,13,14}: 1 (23/1)
## In.flight.service = 15: 0 (3)
##
## SubTree [S35]
##
## Check.in.Service = 0: 1 (0)
## Check.in.Service in {1,2}:
## :...Flight.Distance <= 1259: 0 (24/2)
## : Flight.Distance > 1259: 1 (2)
## Check.in.Service in {3,4}:
## :...Ease.of.Online.Booking in {0,1,2,3,5}: 1 (5)
## Ease.of.Online.Booking = 4:
## :...Comfortability in {3,7}: 0 (33/13)
## Comfortability = 4: 1 (12/3)
## Comfortability = 5:
## :...In.flight.service in {9,11}: 1 (11/3)
## : In.flight.service in {1,2,3,4,5,6,7,8,10,12,13,14,15}: 0 (5)
## Comfortability = 6:
## :...In.flight.service in {1,2,3,4,5,6,7,8,10,11,13,14,15}: 1 (16/2)
## : In.flight.service = 9: 0 (5/1)
## : In.flight.service = 12:
## : :...Flight.Distance <= 383: 0 (3)
## : Flight.Distance > 383: 1 (2)
## Comfortability = 8:
## :...Leg.Room.Service in {0,1,5}: 0 (0)
## : Leg.Room.Service in {3,4}: 1 (10/2)
## : Leg.Room.Service = 2:
## : :...Departure.Delay <= 2: 0 (12/2)
## : Departure.Delay > 2: 1 (4/1)
## Comfortability = 9:
## :...In.flight.service in {1,2,3,4,5,6,7,8,14,15}: 1 (0)
## : In.flight.service in {9,13}: 0 (4)
## : In.flight.service = 10:
## : :...Flight.Distance <= 588: 0 (3)
## : : Flight.Distance > 588: 1 (3)
## : In.flight.service = 11:
## : :...Departure.Delay.Duration <= 3: 0 (3.6/0.4)
## : : Departure.Delay.Duration > 3: 1 (5.4/1.8)
## : In.flight.service = 12:
## : :...Departure.Delay <= 2: 1 (13/2)
## : Departure.Delay > 2: 0 (2)
## Comfortability = 10:
## :...delay.status = On-time: 1 (34/8)
## delay.status = Delayed:
## :...Age <= 26: 1 (2)
## Age > 26: 0 (4)
##
## SubTree [S36]
##
## In.flight.Wifi.Service in {1,2}: 1 (0)
## In.flight.Wifi.Service = 3: 0 (3)
## In.flight.Wifi.Service = 4:
## :...In.flight.Service in {1,2}: 0 (19/5)
## In.flight.Service = 4:
## :...Age_Range in {Children,Youth}: 1 (88/13)
## Age_Range in {Adult,Senior}:
## :...Comfortability in {5,6}: 0 (13/5)
## Comfortability in {3,7,8,10}: 1 (43/14)
## Comfortability = 4:
## :...Gate.Location = 1: 1 (4)
## : Gate.Location in {0,2,3,4,5}: 0 (4)
## Comfortability = 9:
## :...Age > 43: 1 (4)
## Age <= 43:
## :...Class in {Business,Economy}: 0 (10)
## Class = Economy Plus: 1 (1)
##
## SubTree [S37]
##
## In.flight.Entertainment in {1,2}: 0 (5/1)
## In.flight.Entertainment in {0,3,4,5}: 1 (101/1)
##
## SubTree [S38]
##
## In.flight.Wifi.Service = 1: 1 (0)
## In.flight.Wifi.Service in {2,3}: 0 (5)
## In.flight.Wifi.Service = 4:
## :...Class = Business: 1 (51/12)
## Class = Economy Plus: 0 (2)
## Class = Economy:
## :...Comfortability = 3: 1 (1)
## Comfortability in {4,5,6}: 0 (14/2)
## Comfortability = 7:
## :...Seat.Comfort = 1: 1 (4)
## : Seat.Comfort in {0,2,3,4,5}: 0 (6/1)
## Comfortability = 8:
## :...Distance_Group in {Long-haul,Medium-haul}: 0 (2)
## : Distance_Group = Short-haul: 1 (5)
## Comfortability = 9:
## :...In.flight.Service in {1,4}: 1 (8/1)
## : In.flight.Service = 2: 0 (4)
## Comfortability = 10:
## :...In.flight.service in {1,2,3,4,5,6,7,8,12,13,14,15}: 0 (5)
## In.flight.service in {9,10,11}: 1 (4)
##
## SubTree [S39]
##
## In.flight.Wifi.Service = 1: 1 (0)
## In.flight.Wifi.Service in {2,3}: 0 (6)
## In.flight.Wifi.Service = 4:
## :...Pre.flight.service in {9,10}: 0 (19/4)
## Pre.flight.service in {1,2,3,4,5,6,7,8,11,12,13,14,15}: 1 (122/41)
##
## SubTree [S40]
##
## Age_Range in {Adult,Children,Senior}: 0 (228/19)
## Age_Range = Youth:
## :...Departure.and.Arrival.Time.Convenience in {0,5}: 1 (31/11)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}: 0 (110/19)
##
## SubTree [S41]
##
## Seat.Comfort in {1,2}: 0 (29)
## Seat.Comfort = 5: 1 (5)
##
## SubTree [S42]
##
## In.flight.Entertainment = 0: 0 (0)
## In.flight.Entertainment = 1:
## :...Pre.flight.service in {1,2,3,4,5,6,7,8,9,10,11,13,14,15}: 0 (47)
## : Pre.flight.service = 12:
## : :...Class = Business: 1 (3)
## : Class in {Economy,Economy Plus}: 0 (2)
## In.flight.Entertainment in {2,3,4,5}:
## :...In.flight.Service = 1:
## :...Cleanliness = 0: 0 (1)
## : Cleanliness in {1,2,3,5}: 1 (10)
## : Cleanliness = 4:
## : :...Food.and.Drink = 3: 0 (2)
## : Food.and.Drink in {0,1,2,4,5}: 1 (98/10)
## In.flight.Service in {2,4}:
## :...In.flight.Wifi.Service = 2:
## :...Gate.Location in {1,3}: 0 (4)
## : Gate.Location in {0,2,4,5}: 1 (38/2)
## In.flight.Wifi.Service in {1,3,4}:
## :...Cleanliness = 0: 0 (0)
## Cleanliness in {1,2}:
## :...Baggage.Handling in {1,3}: 0 (32)
## : Baggage.Handling in {2,4}:
## : :...Leg.Room.Service in {0,1,3,5}: 0 (47/1)
## : Leg.Room.Service in {2,4}:
## : :...In.flight.Wifi.Service in {1,3}: 1 (15)
## : In.flight.Wifi.Service = 4:
## : :...Class = Business: 0 (111/12)
## : Class in {Economy,Economy Plus}:
## : :...Comfortability in {3,4,5,6,7}: 0 (0)
## : Comfortability in {8,10}:
## : :...Flight.Distance <= 1015: 0 (29/4)
## : : Flight.Distance > 1015: 1 (5/1)
## : Comfortability = 9:
## : :...Departure.Delay <= 12: 1 (27/7)
## : Departure.Delay > 12: 0 (5)
## Cleanliness in {3,4,5}:
## :...Baggage.Handling in {3,4}:
## :...In.flight.Service = 2:
## : :...In.flight.Entertainment = 5: 1 (0)
## : : In.flight.Entertainment = 2: 0 (18/1)
## : : In.flight.Entertainment in {3,4}:
## : : :...Baggage.Handling = 4: 1 (23)
## : : Baggage.Handling = 3:
## : : :...Flight.Distance <= 1038: 1 (36/7)
## : : Flight.Distance > 1038: 0 (6)
## : In.flight.Service = 4:
## : :...Flight.Entertainment in {6,8}: 0 (216/53)
## : Flight.Entertainment in {3,5,7,9}: [S43]
## Baggage.Handling in {1,2}:
## :...Food.and.Drink = 0: 1 (0)
## Food.and.Drink = 1: 0 (9)
## Food.and.Drink in {2,3,4,5}:
## :...Gate.Location = 0: 1 (0)
## Gate.Location in {2,5}:
## :...In.flight.Entertainment in {2,5}: 0 (11)
## : In.flight.Entertainment in {3,4}:
## : :...Class = Business: 0 (2)
## : Class = Economy Plus: 1 (6)
## : Class = Economy:
## : :...In.flight.Service = 2: 0 (16/7)
## : In.flight.Service = 4: 1 (8)
## Gate.Location in {1,3,4}:
## :...Ease.of.Online.Booking = 0: 0 (2)
## Ease.of.Online.Booking in {2,3,5}: 1 (48/7)
## Ease.of.Online.Booking = 4:
## :...In.flight.Wifi.Service = 1: 0 (6)
## : In.flight.Wifi.Service in {3,4}: 1 (40/7)
## Ease.of.Online.Booking = 1:
## :...In.flight.Wifi.Service in {1,3}: 1 (31/2)
## In.flight.Wifi.Service = 4:
## :...In.flight.service in {1,2,3,4,5,8,13,14,
## : 15}: 1 (0)
## In.flight.service in {7,10}: 0 (5)
## In.flight.service in {6,9,11,12}:
## :...Age_Range in {Adult,Children,
## : Senior}: 1 (8)
## Age_Range = Youth: 0 (1)
##
## SubTree [S43]
##
## Departure.and.Arrival.Time.Convenience in {1,3}: 1 (19)
## Departure.and.Arrival.Time.Convenience in {0,2,4,5}:
## :...On.board.Service in {1,2,4}: 0 (16/2)
## On.board.Service = 3: 1 (3)
##
##
## Evaluation on training data (90916 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 517 2606( 2.9%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 50831 628 (a): class 0
## 1978 37479 (b): class 1
##
##
## Attribute usage:
##
## 100.00% In.flight.Wifi.Service
## 89.27% Online.Boarding
## 76.35% Type.of.Travel
## 68.82% Class
## 48.91% Customer.Type
## 38.46% Check.in.Service
## 30.18% Cleanliness
## 29.90% Comfortability
## 24.15% Flight.Entertainment
## 23.06% In.flight.Service
## 22.20% Leg.Room.Service
## 21.70% Gate.Location
## 21.41% In.flight.Entertainment
## 17.78% Baggage.Handling
## 14.86% Ease.of.Online.Booking
## 14.38% Seat.Comfort
## 12.52% On.board.Service
## 6.40% Age
## 3.83% Age_Range
## 3.22% In.flight.service
## 2.30% Departure.and.Arrival.Time.Convenience
## 1.90% Food.and.Drink
## 0.95% Flight.Distance
## 0.63% Pre.flight.service
## 0.51% Distance_Group
## 0.27% Gender
## 0.25% Departure.Delay
## 0.12% arrival.delay.status
## 0.07% delay.status
## 0.02% Arrival.Delay
## 0.01% Departure.Delay.Duration
##
##
## Time: 1.3 secs
Lorem Ipsum
print("Confusion matrix based on testing data")
## [1] "Confusion matrix based on testing data"
pred.test <- predict(model, testing)
gmodels::CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21509 | 484 | 21993 |
## --------------------|-----------|-----------|-----------|
## 1 | 1069 | 15902 | 16971 |
## --------------------|-----------|-----------|-----------|
## Column Total | 22578 | 16386 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
# Calculating Accuracy
accuracy = (21498 + 15676) / 38964
print(accuracy)
## [1] 0.9540602
Accuracy already high with 95.4%
Try with dataset that already log transformed
set.seed(46748717)
library (rsample)
proportion <- 0.7
split <- initial_split(data_clsf_log, prop = proportion)
training <- training(split)
testing <- testing(split)
model <- C50::C5.0(Satisfaction ~.,
data = training)
summary(model)
##
## Call:
## C5.0.formula(formula = Satisfaction ~ ., data = training)
##
##
## C5.0 [Release 2.07 GPL Edition] Fri Apr 4 11:53:09 2025
## -------------------------------
##
## Class specified by attribute `outcome'
##
## Read 90916 cases (34 attributes) from undefined.data
##
## Decision tree:
##
## In.flight.Wifi.Service in {0,5}:
## :...Cleanliness = 0: 0 (4)
## : Cleanliness in {1,2,3,4,5}:
## : :...Ease.of.Online.Booking in {0,5}:
## : :...Flight.Entertainment in {0,1,2,3,4,5,7,8,9}: 1 (6785)
## : : Flight.Entertainment in {6,10}:
## : : :...In.flight.Service in {0,1,2,4,5}: 1 (2966/7)
## : : In.flight.Service = 3:
## : : :...Online.Boarding in {0,1,2,3,5}: 1 (192)
## : : Online.Boarding = 4:
## : : :...Cleanliness in {1,2,3,4}: 1 (12)
## : : Cleanliness = 5: 0 (11/2)
## : Ease.of.Online.Booking in {1,2,3,4}:
## : :...Online.Boarding in {0,1,5}: 1 (1821/3)
## : Online.Boarding in {2,3,4}:
## : :...In.flight.Entertainment in {0,1,2,3,4}: 1 (448/3)
## : In.flight.Entertainment = 5:
## : :...Leg.Room.Service = 0: 1 (0)
## : Leg.Room.Service = 5:
## : :...On.board.Service in {0,1,3,4,5}: 1 (341)
## : : On.board.Service = 2:
## : : :...Age.log <= 3.583519: 0 (3)
## : : Age.log > 3.583519: 1 (3)
## : Leg.Room.Service in {1,2,3,4}:
## : :...Customer.Type = First-time: 1 (15)
## : Customer.Type = Returning:
## : :...Type.of.Travel = Personal: 1 (15)
## : Type.of.Travel = Business:
## : :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5: 1 (6)
## : Check.in.Service in {1,2,3,4}: [S1]
## In.flight.Wifi.Service in {1,2,3,4}:
## :...Online.Boarding in {0,1,2,3}:
## :...In.flight.Wifi.Service = 4:
## : :...Gate.Location = 0: 0 (0)
## : : Gate.Location in {1,2,3,5}:
## : : :...Class = Business:
## : : : :...Customer.Type = Returning: 0 (475/12)
## : : : : Customer.Type = First-time:
## : : : : :...Age.log <= 3.218876: 1 (28/2)
## : : : : Age.log > 3.218876:
## : : : : :...In.flight.service in {1,2,3,4,5,6,7,8,12,
## : : : : : 15}: 1 (11)
## : : : : In.flight.service in {9,10,11,13,14}:
## : : : : :...Pre.flight.service in {6,8,9}: 1 (19/6)
## : : : : Pre.flight.service in {1,2,3,4,5,7,12,13,14,
## : : : : : 15}: 0 (1)
## : : : : Pre.flight.service = 10:
## : : : : :...Departure.Delay <= 5: 0 (16/4)
## : : : : : Departure.Delay > 5: 1 (2)
## : : : : Pre.flight.service = 11:
## : : : : :...In.flight.Service = 4: 1 (7/2)
## : : : : In.flight.Service in {0,1,2,3,
## : : : : 5}: 0 (6)
## : : : Class in {Economy,Economy Plus}:
## : : : :...Type.of.Travel = Personal: 0 (500/143)
## : : : Type.of.Travel = Business:
## : : : :...Online.Boarding = 0: 1 (0)
## : : : Online.Boarding in {1,2}:
## : : : :...In.flight.Service = 0: 1 (0)
## : : : : In.flight.Service in {1,2}: 0 (7)
## : : : : In.flight.Service in {3,4,5}:
## : : : : :...Flight.Distance.log <= 7.305188: 1 (215/31)
## : : : : Flight.Distance.log > 7.305188: 0 (13/3)
## : : : Online.Boarding = 3:
## : : : :...Check.in.Service = 0: 0 (0)
## : : : Check.in.Service = 5: 1 (32/7)
## : : : Check.in.Service in {1,2,3,4}:
## : : : :...Seat.Comfort = 0: 0 (0)
## : : : Seat.Comfort in {1,5}:
## : : : :...Gender = Male: 0 (10/1)
## : : : : Gender = Female:
## : : : : :...Leg.Room.Service = 0: 1 (0)
## : : : : Leg.Room.Service in {3,5}: 0 (5)
## : : : : Leg.Room.Service in {1,2,4}:
## : : : : :...Baggage.Handling in {1,2,4,
## : : : : : 5}: 1 (39/2)
## : : : : Baggage.Handling = 3: 0 (5/1)
## : : : Seat.Comfort in {2,3,4}:
## : : : :...Cleanliness = 0: 0 (0)
## : : : Cleanliness = 5:
## : : : :...Customer.Type = First-time: 0 (2)
## : : : : Customer.Type = Returning: 1 (12)
## : : : Cleanliness in {1,2,3,4}:
## : : : :...Baggage.Handling in {2,3,
## : : : : 4}: 0 (193/42)
## : : : Baggage.Handling = 1:
## : : : :...Comfortability in {3,4,5,6,7,8,10,
## : : : : : 12,14,
## : : : : : 15}: 0 (5)
## : : : : Comfortability in {9,11,13}: 1 (5)
## : : : Baggage.Handling = 5:
## : : : :...Gender = Female: 0 (11/3)
## : : : Gender = Male: 1 (5)
## : : Gate.Location = 4:
## : : :...Type.of.Travel = Personal: 0 (226/52)
## : : Type.of.Travel = Business:
## : : :...Customer.Type = First-time:
## : : :...Class = Business:
## : : : :...arrival.delay.status = Delayed: 0 (2)
## : : : : arrival.delay.status = On Time: 1 (33/4)
## : : : Class in {Economy,Economy Plus}:
## : : : :...Baggage.Handling in {1,5}: 1 (9/3)
## : : : Baggage.Handling in {2,3,4}: 0 (31/2)
## : : Customer.Type = Returning:
## : : :...In.flight.Entertainment = 0: 1 (0)
## : : In.flight.Entertainment in {2,3,5}:
## : : :...Class = Economy Plus: 1 (3/1)
## : : : Class = Economy:
## : : : :...Cleanliness in {0,1,2,3,5}: 1 (6)
## : : : : Cleanliness = 4: 0 (3)
## : : : Class = Business:
## : : : :...Baggage.Handling in {1,2,3,5}: 1 (373/1)
## : : : Baggage.Handling = 4:
## : : : :...In.flight.Service in {0,1,4,5}: 1 (62)
## : : : In.flight.Service in {2,3}: 0 (11/3)
## : : In.flight.Entertainment in {1,4}:
## : : :...Seat.Comfort in {0,1,5}: 1 (140/1)
## : : Seat.Comfort in {2,3,4}:
## : : :...Leg.Room.Service = 0: 1 (0)
## : : Leg.Room.Service in {1,2,3}: 0 (34/6)
## : : Leg.Room.Service in {4,5}:
## : : :...Cleanliness in {0,5}: 1 (43)
## : : Cleanliness in {1,2,3,4}:
## : : :...Check.in.Service in {0,
## : : : 5}: 1 (30)
## : : Check.in.Service in {1,2,3,4}:
## : : :...Baggage.Handling in {1,2,
## : : : 5}: 1 (14)
## : : Baggage.Handling = 3: [S2]
## : : Baggage.Handling = 4: [S3]
## : In.flight.Wifi.Service in {1,2,3}:
## : :...Class = Business:
## : :...In.flight.Entertainment in {4,5}:
## : : :...Customer.Type = First-time: 0 (1417/51)
## : : : Customer.Type = Returning:
## : : : :...Type.of.Travel = Personal: 0 (338)
## : : : Type.of.Travel = Business:
## : : : :...Gate.Location = 0: 1 (0)
## : : : Gate.Location in {4,5}: 0 (59)
## : : : Gate.Location in {1,2,3}: [S4]
## : : In.flight.Entertainment in {0,1,2,3}:
## : : :...Cleanliness = 5:
## : : :...Type.of.Travel = Personal: 0 (16)
## : : : Type.of.Travel = Business:
## : : : :...Customer.Type = First-time: 0 (3)
## : : : Customer.Type = Returning: 1 (46/2)
## : : Cleanliness in {0,1,2,3,4}:
## : : :...Gate.Location in {0,4,5}: 0 (2747/12)
## : : Gate.Location in {1,2,3}:
## : : :...Flight.Entertainment in {0,1,7,8,9,
## : : : 10}: 0 (0)
## : : Flight.Entertainment in {2,4,6}:
## : : :...Check.in.Service = 0: 0 (0)
## : : : Check.in.Service = 5:
## : : : :...Customer.Type = First-time: 0 (167/4)
## : : : : Customer.Type = Returning:
## : : : : :...Type.of.Travel = Business: 1 (68/1)
## : : : : Type.of.Travel = Personal: 0 (24)
## : : : Check.in.Service in {1,2,3,4}:
## : : : :...Seat.Comfort = 0: 0 (0)
## : : : Seat.Comfort = 5:
## : : : :...Customer.Type = First-time: 0 (33/1)
## : : : : Customer.Type = Returning:
## : : : : :...Type.of.Travel = Business: 1 (19/1)
## : : : : Type.of.Travel = Personal: 0 (7)
## : : : Seat.Comfort in {1,2,3,4}:
## : : : :...In.flight.Service = 5:
## : : : :...Customer.Type = First-time: 0 (193/3)
## : : : : Customer.Type = Returning: [S5]
## : : : In.flight.Service in {0,1,2,3,4}:
## : : : :...Baggage.Handling = 5: [S6]
## : : : Baggage.Handling in {1,2,3,4}: [S7]
## : : Flight.Entertainment in {3,5}:
## : : :...Customer.Type = First-time: 0 (612/18)
## : : Customer.Type = Returning:
## : : :...Type.of.Travel = Personal: 0 (175)
## : : Type.of.Travel = Business:
## : : :...Flight.Distance.log <= 5.648974: 0 (48/1)
## : : Flight.Distance.log > 5.648974:
## : : :...In.flight.Entertainment in {0,1}: [S8]
## : : In.flight.Entertainment in {2,3}:
## : : :...Cleanliness = 0: 1 (0)
## : : Cleanliness = 1: 0 (40/12)
## : : Cleanliness in {2,3,4}: [S9]
## : Class in {Economy,Economy Plus}:
## : :...Type.of.Travel = Personal: 0 (16721)
## : Type.of.Travel = Business:
## : :...Customer.Type = First-time: 0 (7240/40)
## : Customer.Type = Returning:
## : :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5: 1 (103/1)
## : Check.in.Service in {1,2,3,4}:
## : :...Baggage.Handling = 5:
## : :...In.flight.Wifi.Service = 1: 0 (2)
## : : In.flight.Wifi.Service in {2,3}: 1 (49)
## : Baggage.Handling in {1,2,3,4}:
## : :...In.flight.Service = 0: 0 (0)
## : In.flight.Service = 5: 1 (35/1)
## : In.flight.Service in {1,2,3,4}:
## : :...Seat.Comfort = 0: 0 (0)
## : Seat.Comfort = 5:
## : :...In.flight.Wifi.Service = 1: 0 (12)
## : : In.flight.Wifi.Service in {2,
## : : 3}: 1 (22/1)
## : Seat.Comfort in {1,2,3,4}:
## : :...Cleanliness = 0: 0 (0)
## : Cleanliness = 5: 1 (14)
## : Cleanliness in {1,2,3,4}:
## : :...On.board.Service = 0: 0 (0)
## : On.board.Service = 5: [S10]
## : On.board.Service in {1,2,3,4}:
## : :...Age.log <= 3.526361: 0 (1150)
## : Age.log > 3.526361: [S11]
## Online.Boarding in {4,5}:
## :...Type.of.Travel = Personal: 0 (8120/932)
## Type.of.Travel = Business:
## :...Comfortability in {11,12,13,14,15}:
## :...Customer.Type = Returning:
## : :...Class = Business:
## : : :...Check.in.Service = 0: 1 (0)
## : : : Check.in.Service in {3,4,5}:
## : : : :...Leg.Room.Service = 0: 1 (0)
## : : : : Leg.Room.Service in {1,2,4,5}:
## : : : : :...In.flight.Wifi.Service in {1,2,
## : : : : : : 3}: 1 (10932/12)
## : : : : : In.flight.Wifi.Service = 4:
## : : : : : :...Gate.Location in {0,1,2,3,
## : : : : : : 5}: 0 (82/2)
## : : : : : Gate.Location = 4: 1 (3686/19)
## : : : : Leg.Room.Service = 3:
## : : : : :...Food.and.Drink = 0: 1 (0)
## : : : : Food.and.Drink = 1: 0 (5)
## : : : : Food.and.Drink in {2,3,4,5}:
## : : : : :...In.flight.Wifi.Service in {1,
## : : : : : 2}: 1 (661/2)
## : : : : In.flight.Wifi.Service in {3,4}:
## : : : : :...Gate.Location in {0,1,2,
## : : : : : 5}: 0 (53)
## : : : : Gate.Location in {3,4}: 1 (654/24)
## : : : Check.in.Service in {1,2}:
## : : : :...In.flight.Wifi.Service in {1,2}:
## : : : :...Gate.Location in {1,2}: 1 (333/1)
## : : : : Gate.Location in {0,3,4,5}: 0 (4)
## : : : In.flight.Wifi.Service in {3,4}:
## : : : :...Gate.Location in {0,1,2,5}: 0 (77)
## : : : Gate.Location in {3,4}:
## : : : :...Online.Boarding = 5: 1 (176)
## : : : Online.Boarding = 4:
## : : : :...In.flight.Service in {0,1,2,
## : : : : 5}: 1 (71/2)
## : : : In.flight.Service in {3,4}:
## : : : :...Seat.Comfort in {0,1}: 0 (0)
## : : : Seat.Comfort in {2,5}: 1 (17)
## : : : Seat.Comfort in {3,4}: [S12]
## : : Class in {Economy,Economy Plus}:
## : : :...Baggage.Handling in {1,2,5}:
## : : :...Baggage.Handling in {1,5}: 1 (400/3)
## : : : Baggage.Handling = 2:
## : : : :...Flight.Entertainment in {0,1,2,3,
## : : : : 10}: 1 (0)
## : : : Flight.Entertainment in {5,7}: 0 (10/3)
## : : : Flight.Entertainment in {4,6,8,9}:
## : : : :...In.flight.Service in {0,4,5}: 1 (86)
## : : : In.flight.Service in {1,2,3}:
## : : : :...Gender = Female: 1 (46/2)
## : : : Gender = Male: [S13]
## : : Baggage.Handling in {3,4}:
## : : :...Check.in.Service in {0,5}: 1 (117)
## : : Check.in.Service in {1,2,3,4}:
## : : :...Online.Boarding = 5: 1 (65)
## : : Online.Boarding = 4:
## : : :...In.flight.Service = 0: 1 (0)
## : : In.flight.Service in {1,2,5}: [S14]
## : : In.flight.Service in {3,4}:
## : : :...On.board.Service = 0: 0 (0)
## : : On.board.Service = 5: [S15]
## : : On.board.Service in {1,2,3,4}:
## : : :...Seat.Comfort in {0,
## : : : 1}: 0 (0)
## : : Seat.Comfort in {2,3,4}:
## : : :...Cleanliness in {0,1,2,3,
## : : : : 4}: 0 (374/108)
## : : : Cleanliness = 5: 1 (10)
## : : Seat.Comfort = 5: [S16]
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,2,3}: 0 (227/4)
## : In.flight.Wifi.Service = 4:
## : :...Check.in.Service = 0: 1 (0)
## : Check.in.Service in {1,2}:
## : :...In.flight.Service = 0: 0 (0)
## : : In.flight.Service in {1,2,5}:
## : : :...Age_Range = Adult: 0 (41/10)
## : : : Age_Range = Children: 1 (1)
## : : : Age_Range = Senior:
## : : : :...In.flight.Service in {1,2}: 0 (3)
## : : : : In.flight.Service = 5: 1 (2)
## : : : Age_Range = Youth:
## : : : :...Baggage.Handling in {1,2,3}: 0 (14/6)
## : : : Baggage.Handling in {4,5}: 1 (11)
## : : In.flight.Service in {3,4}:
## : : :...Gate.Location in {0,3,4}: 0 (129/5)
## : : Gate.Location in {1,2,5}:
## : : :...Leg.Room.Service in {1,3}: 1 (5)
## : : Leg.Room.Service in {0,2,4,5}: 0 (20/4)
## : Check.in.Service in {3,4,5}:
## : :...In.flight.Service = 0: 1 (0)
## : In.flight.Service in {2,3}:
## : :...Gate.Location = 0: 0 (0)
## : : Gate.Location in {1,2}:
## : : :...arrival.delay.status = Delayed: 0 (3)
## : : : arrival.delay.status = On Time: 1 (39/13)
## : : Gate.Location in {3,4,5}: [S17]
## : In.flight.Service in {1,4,5}:
## : :...On.board.Service = 0: 1 (0)
## : On.board.Service in {1,2}:
## : :...In.flight.Service = 4: 0 (38/3)
## : : In.flight.Service in {1,5}:
## : : :...Comfortability in {11,13}: 1 (19/3)
## : : Comfortability in {14,15}: 0 (6/2)
## : : Comfortability = 12:
## : : :...Cleanliness in {0,1,2,3,
## : : : 4}: 0 (5)
## : : Cleanliness = 5: 1 (5/1)
## : On.board.Service in {3,4,5}:
## : :...Age_Range in {Children,
## : : Youth}: 1 (360/77)
## : Age_Range in {Adult,Senior}:
## : :...Class = Economy: 0 (26/3)
## : Class in {Business,Economy Plus}:
## : :...Leg.Room.Service = 1: 0 (2)
## : Leg.Room.Service in {0,3,
## : : 4}: 1 (145/54)
## : Leg.Room.Service = 2:
## : :...Gate.Location = 0: 1 (0)
## : : Gate.Location = 5: 0 (2)
## : : Gate.Location = 1: [S18]
## : : Gate.Location = 2:
## : : :...Gender = Female: 0 (5/1)
## : : : Gender = Male: 1 (3)
## : : Gate.Location = 3:
## : : :...Arrival.Delay <= 3: 1 (12/2)
## : : : Arrival.Delay > 3: 0 (2)
## : : Gate.Location = 4: [S19]
## : Leg.Room.Service = 5: [S20]
## Comfortability in {3,4,5,6,7,8,9,10}:
## :...Online.Boarding = 5:
## :...Customer.Type = Returning: 1 (721/3)
## : Customer.Type = First-time:
## : :...Class in {Economy,Economy Plus}:
## : :...Gate.Location in {0,2,3,4}: 0 (58/7)
## : : Gate.Location in {1,5}:
## : : :...Departure.Delay <= 1: 1 (10/1)
## : : Departure.Delay > 1: 0 (3)
## : Class = Business:
## : :...In.flight.Wifi.Service = 3: 1 (0)
## : In.flight.Wifi.Service = 1: 0 (3)
## : In.flight.Wifi.Service in {2,4}:
## : :...Age_Range in {Children,Senior,
## : : Youth}: 1 (29)
## : Age_Range = Adult:
## : :...Leg.Room.Service = 3: 0 (9/2)
## : Leg.Room.Service in {0,1,5}: 1 (4)
## : Leg.Room.Service = 2:
## : :...Age.log <= 3.583519: 0 (2)
## : : Age.log > 3.583519: 1 (4)
## : Leg.Room.Service = 4:
## : :...Gender = Female: 0 (2)
## : Gender = Male: 1 (3)
## Online.Boarding = 4:
## :...Flight.Entertainment in {0,1,10}: 0 (0)
## Flight.Entertainment in {2,4}:
## :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5:
## : :...Customer.Type = First-time: 0 (4)
## : : Customer.Type = Returning: 1 (36)
## : Check.in.Service in {1,2,3,4}:
## : :...Seat.Comfort = 0: 0 (0)
## : Seat.Comfort in {2,5}:
## : :...Age.log <= 3.135494: 0 (2)
## : : Age.log > 3.135494: 1 (22/1)
## : Seat.Comfort in {1,3,4}:
## : :...In.flight.service in {1,2,14,
## : : 15}: 0 (0)
## : In.flight.service in {11,12,13}:
## : :...Cleanliness in {1,2,4}: 0 (9/1)
## : : Cleanliness in {0,3,5}: 1 (19)
## : In.flight.service in {3,4,5,6,7,8,9,10}:
## : :...In.flight.Entertainment in {4,
## : : 5}: 0 (0)
## : In.flight.Entertainment = 3:
## : :...Gate.Location in {0,2,3,4,5}: 0 (22)
## : : Gate.Location = 1:
## : : :...Departure.Delay <= 3: 1 (12)
## : : Departure.Delay > 3: 0 (2)
## : In.flight.Entertainment in {0,1,2}:
## : :...Comfortability in {3,4,5,6,7,8,9}:
## : :...Gate.Location = 0: 1 (1)
## : : Gate.Location in {1,3,4,
## : : : 5}: 0 (850/4)
## : : Gate.Location = 2:
## : : :...Cleanliness in {0,1,2,4,
## : : : 5}: 0 (180/2)
## : : Cleanliness = 3: [S21]
## : Comfortability = 10:
## : :...Gate.Location = 0: 0 (0)
## : Gate.Location in {1,3,4,5}: [S22]
## : Gate.Location = 2: [S23]
## Flight.Entertainment in {3,5,6,7,8,9}:
## :...In.flight.Service = 0: 1 (0)
## In.flight.Service = 3:
## :...Check.in.Service = 0: 0 (0)
## : Check.in.Service = 5:
## : :...In.flight.Entertainment in {0,3,4,
## : : : 5}: 1 (70)
## : : In.flight.Entertainment in {1,2}:
## : : :...Class = Business: 1 (2)
## : : Class in {Economy,Economy Plus}: 0 (8/1)
## : Check.in.Service in {1,2,3,4}:
## : :...Baggage.Handling in {1,5}:
## : :...Age_Range = Senior: 0 (4)
## : : Age_Range in {Adult,Children,Youth}:
## : : :...Customer.Type = Returning: 1 (57)
## : : Customer.Type = First-time:
## : : :...Departure.Delay <= 1: 1 (20/5)
## : : Departure.Delay > 1: 0 (5)
## : Baggage.Handling in {2,3,4}:
## : :...In.flight.service in {1,2,3,4,13,14,
## : : 15}: 0 (0)
## : In.flight.service = 12:
## : :...Customer.Type = First-time: 0 (4)
## : : Customer.Type = Returning: [S24]
## : In.flight.service in {5,6,7,8,9,10,11}:
## : :...In.flight.Wifi.Service = 3:
## : :...Comfortability in {3,
## : : : 4}: 0 (0)
## : : Comfortability = 5: 1 (2)
## : : Comfortability in {6,7,8,9,10}: [S25]
## : In.flight.Wifi.Service in {1,2,4}:
## : :...Customer.Type = First-time:
## : :...Gate.Location in {0,3,
## : : : 4}: 0 (208/16)
## : : Gate.Location in {1,2,5}: [S26]
## : Customer.Type = Returning:
## : :...Age_Range in {Children,Senior,
## : : Youth}: [S27]
## : Age_Range = Adult: [S28]
## In.flight.Service in {1,2,4,5}:
## :...Check.in.Service = 5:
## :...Customer.Type = Returning: 1 (354)
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,
## : : 2}: 1 (0)
## : In.flight.Wifi.Service = 3: 0 (14)
## : In.flight.Wifi.Service = 4:
## : :...Leg.Room.Service = 0: 1 (0)
## : Leg.Room.Service = 1: 0 (14/3)
## : Leg.Room.Service in {2,3,4,5}:
## : :...Age.log <= 3.433987: 1 (230/47)
## : Age.log > 3.433987:
## : :...Age.log <= 3.610918: 0 (23)
## : Age.log > 3.610918: [S29]
## Check.in.Service in {0,1,2,3,4}:
## :...In.flight.Service = 5:
## :...Customer.Type = Returning:
## : :...Ease.of.Online.Booking = 0: 0 (6)
## : : Ease.of.Online.Booking in {1,2,3,4,5}:
## : : :...Leg.Room.Service in {0,1,2,3,
## : : : 5}: 1 (327)
## : : Leg.Room.Service = 4: [S30]
## : Customer.Type = First-time:
## : :...In.flight.Wifi.Service in {1,
## : : 2}: 1 (0)
## : In.flight.Wifi.Service = 3: 0 (12/1)
## : In.flight.Wifi.Service = 4:
## : :...Age.log <= 3.218876: 1 (147/27)
## : Age.log > 3.218876: [S31]
## In.flight.Service in {1,2,4}:
## :...Baggage.Handling = 5:
## :...Customer.Type = Returning: 1 (141)
## : Customer.Type = First-time: [S32]
## Baggage.Handling in {1,2,3,4}:
## :...On.board.Service = 0: 0 (0)
## On.board.Service = 5:
## :...Customer.Type = Returning: [S33]
## : Customer.Type = First-time: [S34]
## On.board.Service in {1,2,3,4}:
## :...Customer.Type = First-time:
## :...Class = Business: [S35]
## : Class in {Economy,Economy Plus}: [S36]
## Customer.Type = Returning:
## :...Seat.Comfort in {1,2,5}:
## :...Age.log > 3.465736: 1 (152/1)
## : Age.log <= 3.465736: [S37]
## Seat.Comfort in {0,3,4}: [S38]
##
## SubTree [S1]
##
## Departure.and.Arrival.Time.Convenience = 0: 0 (0)
## Departure.and.Arrival.Time.Convenience = 5: 1 (14/3)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}:
## :...Class = Business: 0 (52/1)
## Class in {Economy,Economy Plus}:
## :...Baggage.Handling in {1,2,5}: 1 (22/1)
## Baggage.Handling in {3,4}: 0 (24/5)
##
## SubTree [S2]
##
## Class in {Business,Economy}: 1 (10)
## Class = Economy Plus: 0 (2)
##
## SubTree [S3]
##
## Online.Boarding in {0,1}: 1 (38/5)
## Online.Boarding in {2,3}:
## :...In.flight.service in {1,2,3,4,5,6,7,8,11,15}: 0 (29/11)
## In.flight.service = 14: 1 (2)
## In.flight.service = 9:
## :...Online.Boarding = 2: 1 (5)
## : Online.Boarding = 3: 0 (14/5)
## In.flight.service = 10:
## :...Distance_Group = Long-haul: 1 (8/2)
## : Distance_Group = Medium-haul: 0 (11/3)
## : Distance_Group = Short-haul:
## : :...Age_Range in {Adult,Children,Youth}: 1 (15/3)
## : Age_Range = Senior: 0 (2)
## In.flight.service = 12:
## :...Cleanliness in {2,4}: 0 (27/10)
## : Cleanliness = 1:
## : :...Online.Boarding = 2: 1 (4/1)
## : : Online.Boarding = 3: 0 (6)
## : Cleanliness = 3:
## : :...Class in {Business,Economy Plus}: 1 (7)
## : Class = Economy: 0 (2)
## In.flight.service = 13:
## :...arrival.delay.status = Delayed: 0 (4)
## arrival.delay.status = On Time:
## :...Seat.Comfort in {2,4}: 1 (12/2)
## Seat.Comfort = 3:
## :...Comfortability in {3,4,5,6,7,8,10,12,13,14,15}: 0 (8/1)
## Comfortability in {9,11}: 1 (6/1)
##
## SubTree [S4]
##
## Departure.and.Arrival.Time.Convenience = 0: 0 (25)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4,5}:
## :...Leg.Room.Service = 0: 1 (0)
## Leg.Room.Service in {1,2,3}:
## :...Baggage.Handling in {1,2,3}: 0 (42/3)
## : Baggage.Handling in {4,5}:
## : :...Flight.Distance.log > 7.952967: 0 (4)
## : Flight.Distance.log <= 7.952967:
## : :...On.board.Service in {0,1,3,4}: 1 (20)
## : On.board.Service in {2,5}:
## : :...Departure.and.Arrival.Time.Convenience = 2: 1 (2)
## : Departure.and.Arrival.Time.Convenience in {1,3,4,5}: 0 (3)
## Leg.Room.Service in {4,5}:
## :...Age_Range in {Children,Youth}:
## :...Baggage.Handling in {1,2,3,4}: 0 (7)
## : Baggage.Handling = 5: 1 (2)
## Age_Range in {Adult,Senior}:
## :...Baggage.Handling in {1,2}:
## :...Leg.Room.Service = 4: 0 (24/2)
## : Leg.Room.Service = 5: 1 (3)
## Baggage.Handling in {4,5}:
## :...In.flight.Service in {1,2}: 0 (5/1)
## : In.flight.Service in {0,3,4,5}: 1 (1446/1)
## Baggage.Handling = 3:
## :...In.flight.Service in {0,1}: 1 (0)
## In.flight.Service in {2,3}: 0 (5)
## In.flight.Service in {4,5}:
## :...In.flight.Wifi.Service in {1,2}: 1 (37)
## In.flight.Wifi.Service = 3:
## :...Gate.Location in {1,2}: 0 (5)
## Gate.Location = 3: 1 (20)
##
## SubTree [S5]
##
## Type.of.Travel = Business: 1 (53)
## Type.of.Travel = Personal: 0 (21)
##
## SubTree [S6]
##
## Customer.Type = First-time: 0 (79/1)
## Customer.Type = Returning:
## :...Type.of.Travel = Business: 1 (30/1)
## Type.of.Travel = Personal: 0 (21)
##
## SubTree [S7]
##
## On.board.Service in {0,1,2}: 0 (1647/20)
## On.board.Service = 5:
## :...Customer.Type = First-time: 0 (38/1)
## : Customer.Type = Returning:
## : :...In.flight.Service in {0,1,2}: 0 (14)
## : In.flight.Service in {3,4}:
## : :...Distance_Group = Short-haul: 0 (5)
## : Distance_Group in {Long-haul,Medium-haul}:
## : :...Comfortability in {3,4,6,7,8,9,11,12,13,14,15}: 1 (14)
## : Comfortability in {5,10}: 0 (3)
## On.board.Service in {3,4}:
## :...In.flight.Wifi.Service in {2,3}: 0 (1101/31)
## In.flight.Wifi.Service = 1:
## :...Gate.Location in {2,3}: 0 (167)
## Gate.Location = 1:
## :...In.flight.Entertainment in {0,1,2}: 0 (56/1)
## In.flight.Entertainment = 3:
## :...Customer.Type = First-time: 0 (3)
## Customer.Type = Returning:
## :...Baggage.Handling in {2,3,4}: 1 (26/3)
## Baggage.Handling = 1:
## :...Age.log <= 3.465736: 1 (3)
## Age.log > 3.465736: 0 (9)
##
## SubTree [S8]
##
## Baggage.Handling = 5: 0 (0)
## Baggage.Handling = 1: 1 (4)
## Baggage.Handling in {2,3,4}:
## :...Comfortability = 4: 1 (2)
## Comfortability in {3,5,6,7,8,9,10,11,12,13,14,15}: 0 (42)
##
## SubTree [S9]
##
## In.flight.Service in {0,5}: 1 (67)
## In.flight.Service in {1,2,3,4}:
## :...Check.in.Service in {0,5}: 1 (49)
## Check.in.Service in {1,2,3,4}:
## :...Baggage.Handling = 5: 1 (25)
## Baggage.Handling in {1,2,3,4}:
## :...Seat.Comfort in {0,5}: 1 (15)
## Seat.Comfort in {1,2,3,4}:
## :...Leg.Room.Service = 5: 1 (11)
## Leg.Room.Service in {0,1,2,3,4}:
## :...Online.Boarding = 0: 1 (0)
## Online.Boarding = 1:
## :...Baggage.Handling = 1: 0 (2)
## : Baggage.Handling in {2,3,4}: 1 (11)
## Online.Boarding = 2:
## :...In.flight.Entertainment = 3: 0 (18/4)
## : In.flight.Entertainment = 2:
## : :...Ease.of.Online.Booking = 0: 1 (0)
## : Ease.of.Online.Booking in {2,4}: 0 (6)
## : Ease.of.Online.Booking in {1,3,5}:
## : :...Seat.Comfort in {1,3}: 0 (4/1)
## : Seat.Comfort in {2,4}: 1 (29)
## Online.Boarding = 3:
## :...Distance_Group in {Long-haul,Short-haul}: 0 (42/10)
## Distance_Group = Medium-haul:
## :...Ease.of.Online.Booking in {0,5}: 0 (3)
## Ease.of.Online.Booking in {2,4}: 1 (11)
## Ease.of.Online.Booking = 1:
## :...Departure.Delay <= 1: 0 (3)
## : Departure.Delay > 1: 1 (2)
## Ease.of.Online.Booking = 3:
## :...Departure.Delay <= 1: 0 (2)
## Departure.Delay > 1: 1 (3)
##
## SubTree [S10]
##
## In.flight.Wifi.Service = 1: 0 (16)
## In.flight.Wifi.Service in {2,3}:
## :...Distance_Group = Long-haul: 0 (10)
## Distance_Group in {Medium-haul,Short-haul}:
## :...Leg.Room.Service in {0,1,2,5}: 1 (21/2)
## Leg.Room.Service in {3,4}:
## :...In.flight.Entertainment in {0,1,4,5}: 0 (12/1)
## In.flight.Entertainment = 2: 1 (11/4)
## In.flight.Entertainment = 3:
## :...Age.log <= 3.850147: 1 (3)
## Age.log > 3.850147: 0 (6)
##
## SubTree [S11]
##
## In.flight.Wifi.Service = 1: 0 (747)
## In.flight.Wifi.Service in {2,3}:
## :...In.flight.Entertainment = 0: 0 (0)
## In.flight.Entertainment = 5: 1 (3)
## In.flight.Entertainment in {1,2,3,4}:
## :...In.flight.Service = 1:
## :...Flight.Entertainment in {0,1,2,3,5,8,9,10}: 0 (26/1)
## : Flight.Entertainment in {4,6,7}:
## : :...Baggage.Handling in {3,4}: 1 (22/6)
## : Baggage.Handling in {1,2}:
## : :...Class = Economy: 0 (38/6)
## : Class = Economy Plus:
## : :...Comfortability in {3,4,5,6,7,9,10,12,13,14,
## : : 15}: 1 (8/1)
## : Comfortability in {8,11}: 0 (3)
## In.flight.Service in {2,3,4}:
## :...Baggage.Handling = 1:
## :...Flight.Entertainment in {0,1,2,8,9,10}: 0 (0)
## : Flight.Entertainment in {3,5,7}: 1 (6)
## : Flight.Entertainment in {4,6}:
## : :...In.flight.Service = 2: 0 (36/6)
## : In.flight.Service = 3:
## : :...Flight.Distance.log <= 7.290975: 1 (5)
## : : Flight.Distance.log > 7.290975: 0 (8)
## : In.flight.Service = 4:
## : :...Pre.flight.service in {6,8}: 0 (3)
## : Pre.flight.service in {1,2,3,4,5,7,9,10,11,12,13,14,
## : 15}: 1 (7)
## Baggage.Handling in {2,3,4}:
## :...Online.Boarding = 0: 0 (0)
## Online.Boarding = 1:
## :...Seat.Comfort in {3,4}: 1 (10/1)
## : Seat.Comfort in {1,2}:
## : :...Comfortability in {3,4,5,6,7,10,11,12,13,14,
## : : 15}: 0 (30)
## : Comfortability in {8,9}:
## : :...Food.and.Drink in {0,1,2,3,5}: 0 (12/2)
## : Food.and.Drink = 4: 1 (4)
## Online.Boarding in {2,3}:
## :...Baggage.Handling = 2:
## :...In.flight.Service in {2,3}: 0 (557/35)
## : In.flight.Service = 4:
## : :...Flight.Distance.log <= 6.133398: 1 (8)
## : Flight.Distance.log > 6.133398: 0 (15/2)
## Baggage.Handling in {3,4}:
## :...In.flight.Service in {3,4}: 0 (1766/44)
## In.flight.Service = 2:
## :...Baggage.Handling = 3: 0 (94/2)
## Baggage.Handling = 4:
## :...Age.log <= 3.931826: 1 (13/2)
## Age.log > 3.931826: 0 (12/1)
##
## SubTree [S12]
##
## Flight.Entertainment in {0,1,2,3,4,5,6,8,9,10}: 0 (67/15)
## Flight.Entertainment = 7: 1 (13/1)
##
## SubTree [S13]
##
## Pre.flight.service in {6,8}: 0 (14/4)
## Pre.flight.service in {1,2,3,4,5,7,9,10,11,12,13,14,15}: 1 (83/13)
##
## SubTree [S14]
##
## In.flight.Entertainment = 0: 1 (0)
## In.flight.Entertainment in {1,2,3}:
## :...Cleanliness in {0,1,2,5}: 0 (0)
## : Cleanliness = 3: 1 (8)
## : Cleanliness = 4:
## : :...In.flight.Service in {1,2}: 0 (20/4)
## : In.flight.Service = 5: 1 (4)
## In.flight.Entertainment in {4,5}:
## :...Baggage.Handling = 4: 1 (97/1)
## Baggage.Handling = 3:
## :...Departure.Delay <= 21: 1 (76/13)
## Departure.Delay > 21: 0 (3)
##
## SubTree [S15]
##
## In.flight.Entertainment = 2: 0 (3)
## In.flight.Entertainment in {0,1,3,4,5}: 1 (28/2)
##
## SubTree [S16]
##
## In.flight.Entertainment in {0,1,2,3,4}: 1 (21)
## In.flight.Entertainment = 5: 0 (14/1)
##
## SubTree [S17]
##
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}: 0 (88/6)
## Departure.and.Arrival.Time.Convenience in {0,5}:
## :...Comfortability in {13,14}: 0 (8)
## Comfortability in {11,12,15}:
## :...Check.in.Service = 3: 0 (7/2)
## Check.in.Service in {4,5}: 1 (8)
##
## SubTree [S18]
##
## Ease.of.Online.Booking in {0,1,2,3,4}: 0 (9)
## Ease.of.Online.Booking = 5: 1 (2)
##
## SubTree [S19]
##
## Distance_Group in {Long-haul,Short-haul}: 1 (13/3)
## Distance_Group = Medium-haul: 0 (3)
##
## SubTree [S20]
##
## Pre.flight.service in {10,14,15}: 0 (6/2)
## Pre.flight.service in {1,2,3,4,5,6,7,8,9,12}: 1 (42/12)
## Pre.flight.service = 11:
## :...Distance_Group = Long-haul: 0 (3)
## : Distance_Group in {Medium-haul,Short-haul}: 1 (45/10)
## Pre.flight.service = 13:
## :...Comfortability in {11,14}: 1 (19/7)
## Comfortability in {12,15}: 0 (8/1)
## Comfortability = 13:
## :...Age.log <= 3.465736: 1 (4/1)
## Age.log > 3.465736:
## :...In.flight.service in {9,14}: 1 (3)
## In.flight.service in {1,2,3,4,5,6,7,8,10,11,12,13,15}: 0 (7)
##
## SubTree [S21]
##
## On.board.Service in {0,1,3,4,5}: 0 (23)
## On.board.Service = 2:
## :...Check.in.Service in {1,2,4}: 0 (25/4)
## Check.in.Service = 3: 1 (7/1)
##
## SubTree [S22]
##
## In.flight.Wifi.Service in {1,2,4}: 0 (91/2)
## In.flight.Wifi.Service = 3:
## :...Customer.Type = First-time: 0 (2)
## Customer.Type = Returning: 1 (3)
##
## SubTree [S23]
##
## Age_Range in {Children,Youth}: 1 (0)
## Age_Range = Senior: 0 (4)
## Age_Range = Adult:
## :...Check.in.Service in {1,2}: 0 (5/1)
## Check.in.Service = 4: 1 (12/2)
## Check.in.Service = 3:
## :...Food.and.Drink = 2: 0 (2)
## Food.and.Drink in {0,1,3,4,5}: 1 (6)
##
## SubTree [S24]
##
## In.flight.Entertainment = 1: 0 (2)
## In.flight.Entertainment in {0,2,3,4,5}: 1 (15)
##
## SubTree [S25]
##
## In.flight.Entertainment in {0,1}: 0 (0)
## In.flight.Entertainment in {4,5}:
## :...Customer.Type = First-time: 0 (6)
## : Customer.Type = Returning: 1 (7/1)
## In.flight.Entertainment in {2,3}:
## :...Seat.Comfort in {0,1,3,4}: 0 (621/13)
## Seat.Comfort in {2,5}:
## :...On.board.Service in {0,1,2,4,5}: 0 (8)
## On.board.Service = 3: 1 (7)
##
## SubTree [S26]
##
## Age_Range = Adult: 0 (14)
## Age_Range in {Children,Senior,Youth}:
## :...Baggage.Handling in {2,3}: 0 (13/4)
## Baggage.Handling = 4: 1 (8)
##
## SubTree [S27]
##
## In.flight.Wifi.Service = 4: 0 (84/6)
## In.flight.Wifi.Service = 1:
## :...Seat.Comfort in {0,1,2,3,5}: 0 (4)
## : Seat.Comfort = 4: 1 (4)
## In.flight.Wifi.Service = 2:
## :...Baggage.Handling in {2,3}: 0 (11/1)
## Baggage.Handling = 4: 1 (3)
##
## SubTree [S28]
##
## Seat.Comfort = 0: 0 (0)
## Seat.Comfort in {2,5}: 1 (9)
## Seat.Comfort = 1:
## :...In.flight.Entertainment in {2,4}: 0 (3)
## : In.flight.Entertainment in {0,1,3,5}: 1 (5)
## Seat.Comfort = 3:
## :...Leg.Room.Service in {0,1,2,3,5}: 0 (32)
## : Leg.Room.Service = 4:
## : :...Cleanliness = 1: 1 (3)
## : Cleanliness in {0,2,3,4,5}: 0 (3)
## Seat.Comfort = 4:
## :...Distance_Group = Medium-haul:
## :...Cleanliness in {0,1,2,4,5}: 0 (37/8)
## : Cleanliness = 3: 1 (6)
## Distance_Group = Long-haul:
## :...In.flight.Wifi.Service = 1: 1 (2)
## : In.flight.Wifi.Service = 4: 0 (19)
## : In.flight.Wifi.Service = 2:
## : :...Baggage.Handling = 2: 0 (6)
## : Baggage.Handling in {3,4}: 1 (3)
## Distance_Group = Short-haul:
## :...Flight.Entertainment in {3,5}: 0 (9/3)
## Flight.Entertainment in {6,7,9}: 1 (10/1)
## Flight.Entertainment = 8:
## :...Pre.flight.service in {6,12,13}: 1 (14/3)
## Pre.flight.service in {1,2,3,4,5,9,10,14,15}: 0 (35/12)
## Pre.flight.service = 7:
## :...Class in {Business,Economy}: 0 (8/1)
## : Class = Economy Plus: 1 (2)
## Pre.flight.service = 8:
## :...Gender = Female: 0 (5)
## : Gender = Male: 1 (6/1)
## Pre.flight.service = 11:
## :...Baggage.Handling in {2,3}: 1 (6/1)
## Baggage.Handling = 4: 0 (9/2)
##
## SubTree [S29]
##
## Class = Business: 1 (81/23)
## Class in {Economy,Economy Plus}: 0 (20/5)
##
## SubTree [S30]
##
## In.flight.service in {1,2,3,4,5,6,7,8,9,10,11,12,13,14}: 1 (23/1)
## In.flight.service = 15: 0 (3)
##
## SubTree [S31]
##
## Check.in.Service = 0: 1 (0)
## Check.in.Service in {1,2}: 0 (26/4)
## Check.in.Service in {3,4}:
## :...Ease.of.Online.Booking in {0,1,2,3,5}: 1 (5)
## Ease.of.Online.Booking = 4:
## :...Comfortability in {3,7}: 0 (33/13)
## Comfortability = 4: 1 (12/3)
## Comfortability = 5:
## :...In.flight.service in {9,11}: 1 (11/3)
## : In.flight.service in {1,2,3,4,5,6,7,8,10,12,13,14,15}: 0 (5)
## Comfortability = 6:
## :...In.flight.service in {1,2,3,4,5,6,7,8,10,11,13,14,15}: 1 (16/2)
## : In.flight.service = 9: 0 (5/1)
## : In.flight.service = 12:
## : :...Flight.Distance.log <= 5.937536: 0 (3)
## : Flight.Distance.log > 5.937536: 1 (2)
## Comfortability = 8:
## :...Leg.Room.Service in {0,1,5}: 0 (0)
## : Leg.Room.Service in {3,4}: 1 (10/2)
## : Leg.Room.Service = 2:
## : :...Departure.Delay <= 2: 0 (12/2)
## : Departure.Delay > 2: 1 (4/1)
## Comfortability = 9:
## :...In.flight.service in {1,2,3,4,5,6,7,8,14,15}: 1 (0)
## : In.flight.service in {9,11,13}: 0 (13/4)
## : In.flight.service = 10:
## : :...Flight.Distance.log <= 6.349139: 0 (3)
## : : Flight.Distance.log > 6.349139: 1 (3)
## : In.flight.service = 12:
## : :...Departure.Delay <= 2: 1 (13/2)
## : Departure.Delay > 2: 0 (2)
## Comfortability = 10:
## :...delay.status = On-time: 1 (34/8)
## delay.status = Delayed:
## :...Departure.Delay <= 20: 0 (4)
## Departure.Delay > 20: 1 (2)
##
## SubTree [S32]
##
## In.flight.Wifi.Service in {1,2}: 1 (0)
## In.flight.Wifi.Service = 3: 0 (3)
## In.flight.Wifi.Service = 4:
## :...In.flight.Service in {1,2}: 0 (19/5)
## In.flight.Service = 4:
## :...Age_Range in {Children,Youth}: 1 (88/13)
## Age_Range in {Adult,Senior}:
## :...Comfortability in {5,6}: 0 (13/5)
## Comfortability in {3,7,8,10}: 1 (43/14)
## Comfortability = 4:
## :...Gate.Location = 1: 1 (4)
## : Gate.Location in {0,2,3,4,5}: 0 (4)
## Comfortability = 9:
## :...Age.log <= 3.7612: 0 (11/1)
## Age.log > 3.7612: 1 (4)
##
## SubTree [S33]
##
## In.flight.Entertainment in {1,2}: 0 (5/1)
## In.flight.Entertainment in {0,3,4,5}: 1 (101/1)
##
## SubTree [S34]
##
## In.flight.Wifi.Service = 1: 1 (0)
## In.flight.Wifi.Service in {2,3}: 0 (5)
## In.flight.Wifi.Service = 4:
## :...Class = Business: 1 (51/12)
## Class = Economy Plus: 0 (2)
## Class = Economy:
## :...Comfortability in {3,8}: 1 (8/2)
## Comfortability in {4,5,6}: 0 (14/2)
## Comfortability = 7:
## :...Seat.Comfort = 1: 1 (4)
## : Seat.Comfort in {0,2,3,4,5}: 0 (6/1)
## Comfortability = 9:
## :...In.flight.Service in {1,4}: 1 (8/1)
## : In.flight.Service = 2: 0 (4)
## Comfortability = 10:
## :...In.flight.service in {1,2,3,4,5,6,7,8,12,13,14,15}: 0 (5)
## In.flight.service in {9,10,11}: 1 (4)
##
## SubTree [S35]
##
## In.flight.Wifi.Service = 1: 1 (0)
## In.flight.Wifi.Service in {2,3}: 0 (6)
## In.flight.Wifi.Service = 4:
## :...Pre.flight.service in {9,10}: 0 (19/4)
## Pre.flight.service in {1,2,3,4,5,6,7,8,11,12,13,14,15}: 1 (122/41)
##
## SubTree [S36]
##
## Age_Range in {Adult,Children,Senior}: 0 (228/19)
## Age_Range = Youth:
## :...Departure.and.Arrival.Time.Convenience in {0,5}: 1 (31/11)
## Departure.and.Arrival.Time.Convenience in {1,2,3,4}: 0 (110/19)
##
## SubTree [S37]
##
## Seat.Comfort in {1,2}: 0 (29)
## Seat.Comfort = 5: 1 (5)
##
## SubTree [S38]
##
## In.flight.Entertainment in {0,1}: 0 (52/3)
## In.flight.Entertainment in {2,3,4,5}:
## :...In.flight.Service = 1: 1 (111/13)
## In.flight.Service in {2,4}:
## :...In.flight.Wifi.Service = 2:
## :...Gate.Location in {1,3}: 0 (4)
## : Gate.Location in {0,2,4,5}: 1 (38/2)
## In.flight.Wifi.Service in {1,3,4}:
## :...Cleanliness = 0: 0 (0)
## Cleanliness in {1,2}:
## :...Baggage.Handling in {1,3}: 0 (32)
## : Baggage.Handling in {2,4}:
## : :...Leg.Room.Service in {0,1,3,5}: 0 (47/1)
## : Leg.Room.Service in {2,4}:
## : :...In.flight.Wifi.Service in {1,3}: 1 (15)
## : In.flight.Wifi.Service = 4:
## : :...Class = Business: 0 (111/12)
## : Class in {Economy,Economy Plus}:
## : :...Comfortability in {3,4,5,6,7}: 0 (0)
## : Comfortability in {8,10}:
## : :...Flight.Distance.log <= 6.922644: 0 (29/4)
## : : Flight.Distance.log > 6.922644: 1 (5/1)
## : Comfortability = 9:
## : :...Departure.Delay <= 12: 1 (27/7)
## : Departure.Delay > 12: 0 (5)
## Cleanliness in {3,4,5}:
## :...Baggage.Handling in {3,4}:
## :...In.flight.Service = 2:
## : :...In.flight.Entertainment = 5: 1 (0)
## : : In.flight.Entertainment = 2: 0 (18/1)
## : : In.flight.Entertainment in {3,4}:
## : : :...Baggage.Handling = 4: 1 (23)
## : : Baggage.Handling = 3:
## : : :...Flight.Distance.log <= 6.944087: 1 (36/7)
## : : Flight.Distance.log > 6.944087: 0 (6)
## : In.flight.Service = 4:
## : :...Flight.Entertainment in {6,8}: 0 (216/53)
## : Flight.Entertainment in {3,5,7,9}: [S39]
## Baggage.Handling in {1,2}:
## :...Food.and.Drink = 0: 1 (0)
## Food.and.Drink = 1: 0 (9)
## Food.and.Drink in {2,3,4,5}:
## :...Gate.Location = 0: 1 (0)
## Gate.Location in {1,3,4}:
## :...Ease.of.Online.Booking = 0: 0 (2)
## : Ease.of.Online.Booking in {2,3,5}: 1 (48/7)
## : Ease.of.Online.Booking = 4:
## : :...In.flight.Wifi.Service = 1: 0 (6)
## : : In.flight.Wifi.Service in {3,4}: 1 (40/7)
## : Ease.of.Online.Booking = 1:
## : :...In.flight.Wifi.Service in {1,3}: 1 (31/2)
## : In.flight.Wifi.Service = 4:
## : :...In.flight.service in {1,2,3,4,5,6,8,9,11,
## : : 12,13,14,
## : : 15}: 1 (9/1)
## : In.flight.service in {7,10}: 0 (5)
## Gate.Location in {2,5}:
## :...In.flight.Entertainment in {2,5}: 0 (11)
## In.flight.Entertainment in {3,4}:
## :...Class = Business: 0 (2)
## Class = Economy Plus: 1 (6)
## Class = Economy:
## :...In.flight.Service = 2: 0 (16/7)
## In.flight.Service = 4: 1 (8)
##
## SubTree [S39]
##
## Departure.and.Arrival.Time.Convenience in {1,3}: 1 (19)
## Departure.and.Arrival.Time.Convenience in {0,2,4,5}:
## :...On.board.Service in {1,2,4}: 0 (16/2)
## On.board.Service = 3: 1 (3)
##
##
## Evaluation on training data (90916 cases):
##
## Decision Tree
## ----------------
## Size Errors
##
## 453 2687( 3.0%) <<
##
##
## (a) (b) <-classified as
## ---- ----
## 50780 679 (a): class 0
## 2008 37449 (b): class 1
##
##
## Attribute usage:
##
## 100.00% In.flight.Wifi.Service
## 89.27% Online.Boarding
## 76.35% Type.of.Travel
## 68.40% Class
## 48.91% Customer.Type
## 38.46% Check.in.Service
## 30.06% Cleanliness
## 29.90% Comfortability
## 24.15% Flight.Entertainment
## 23.02% In.flight.Service
## 22.20% Leg.Room.Service
## 21.70% Gate.Location
## 21.21% In.flight.Entertainment
## 17.59% Baggage.Handling
## 14.84% Ease.of.Online.Booking
## 14.38% Seat.Comfort
## 12.49% On.board.Service
## 6.23% Age.log
## 3.78% Age_Range
## 3.18% In.flight.service
## 2.26% Departure.and.Arrival.Time.Convenience
## 1.75% Food.and.Drink
## 0.90% Flight.Distance.log
## 0.57% Pre.flight.service
## 0.47% Distance_Group
## 0.27% Gender
## 0.25% Departure.Delay
## 0.12% arrival.delay.status
## 0.04% delay.status
## 0.02% Arrival.Delay
##
##
## Time: 1.3 secs
print("Confusion matrix based on testing data")
## [1] "Confusion matrix based on testing data"
pred.test <- predict(model, testing)
gmodels::CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21501 | 492 | 21993 |
## --------------------|-----------|-----------|-----------|
## 1 | 1065 | 15906 | 16971 |
## --------------------|-----------|-----------|-----------|
## Column Total | 22566 | 16398 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
The result is almost identical since the algorithm determine the transformed data is not have significant importance. Therefore the improvement will focus on tuning the parameter of the model.
Boosting
Lorem Ipsum
library(C50)
## Warning: package 'C50' was built under R version 4.4.3
library(gmodels)
## Warning: package 'gmodels' was built under R version 4.4.3
set.seed(46748717)
proportion <- 0.7
split <- initial_split(data_clsf_ori, prop = proportion)
training <- training(split)
testing <- testing(split)
model_boost <- C5.0(Satisfaction ~.,
data = training,
trials = 10)
# Predicting on the testing data
print("Confusion matrix based on testing data (boosting)")
## [1] "Confusion matrix based on testing data (boosting)"
pred.test <- predict(model_boost, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21370 | 623 | 21993 |
## --------------------|-----------|-----------|-----------|
## 1 | 980 | 15991 | 16971 |
## --------------------|-----------|-----------|-----------|
## Column Total | 22350 | 16614 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
# Calculating Accuracy with Booster
accuracy = (21229 + 15816) / 38964
print(accuracy)
## [1] 0.9507494
Accuracy didnt increase significant
Lorem Ipsum
Assigning Cost to Mistake
More costly to predict dissatisfied customer satisfied, cost higher in False Positive
# Specifying the cost matrix
cost.matrix <- matrix(c(NA, 1, # FN costs of predicting "Disatisfied" whereas actual value is "Satisfied"
2, NA), # FP costs of predicting "Satisfied" whereas actual value is "Disatisfeid"
nrow = 2,
ncol = 2,
byrow = FALSE)
rownames(cost.matrix) <- colnames(cost.matrix) <- c(1, 0)
# Estimating the model with the cost matrix
model.cost <- C5.0(Satisfaction ~.,
data = training,
costs = cost.matrix)
print("Confusion matrix based on testing data (costs")
## [1] "Confusion matrix based on testing data (costs"
pred.test <- predict(model.cost, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21726 | 267 | 21993 |
## --------------------|-----------|-----------|-----------|
## 1 | 1424 | 15547 | 16971 |
## --------------------|-----------|-----------|-----------|
## Column Total | 23150 | 15814 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
# Calculating Accuracy with Booster
accuracy = (21755 + 15087 ) / 38964
print(accuracy)
## [1] 0.9455395
Accuracy decrease slightly
# Calculating Decrease in False Positive
decrease_fp = (495 - 238 ) / 495
print(decrease_fp)
## [1] 0.5191919
False positive decrease by 51.9%, from 495 to 238
Random Forest
data_randfor <- data_clsf_ori %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration, Flight.Distance, Age))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix based on testing data (costs")
## [1] "Confusion matrix based on testing data (costs"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21561 | 677 | 22238 |
## --------------------|-----------|-----------|-----------|
## 1 | 1083 | 15643 | 16726 |
## --------------------|-----------|-----------|-----------|
## Column Total | 22644 | 16320 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
# Calculating Accuracy with Random Forest
accuracy = (21410 + 15793 ) / 38964
print(accuracy)
## [1] 0.9548044
Accuracy increase 95.4%
Using Ranger to implement random forest to the database with high cardinality
# install.packages("ranger")
library(ranger)
## Warning: package 'ranger' was built under R version 4.4.3
proportion <- 0.7
split <- initial_split(data_clsf_ori, prop = proportion)
training <- training(split)
testing <- testing(split)
model.ranger <- ranger(Satisfaction ~.,
data = training,
num.trees = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE) # sampling of cases with or without replacement?
## Growing trees.. Progress: 58%. Estimated remaining time: 22 seconds.
print("Confusion matrix based on testing data (costs")
## [1] "Confusion matrix based on testing data (costs"
pred.test <- predict(model.ranger, testing)$predictions
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 38964
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 21337 | 642 | 21979 |
## --------------------|-----------|-----------|-----------|
## 1 | 1231 | 15754 | 16985 |
## --------------------|-----------|-----------|-----------|
## Column Total | 22568 | 16396 | 38964 |
## --------------------|-----------|-----------|-----------|
##
##
# Calculating Accuracy with Ranger
accuracy = (21412 + 15739 ) / 38964
print(accuracy)
## [1] 0.9534699
Accuracy didnt increase significant
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.4.3
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ranger':
##
## importance
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
importance <- importance(model.forest)
print(importance)
## 0 1 MeanDecreaseAccuracy
## Gender 17.977876 13.32974 22.24641
## Customer.Type 43.740193 45.61284 52.14515
## Type.of.Travel 44.432055 54.55586 57.14130
## Class 34.941334 37.79850 41.09677
## Departure.Delay 15.272430 14.28159 21.01826
## Arrival.Delay 13.848610 24.28030 25.83692
## Departure.and.Arrival.Time.Convenience 33.433667 38.36914 43.36646
## Ease.of.Online.Booking 41.794805 33.22311 44.11130
## Check.in.Service 47.839178 29.57097 50.98918
## Online.Boarding 47.767750 34.34686 48.41888
## Gate.Location 24.953126 29.18174 30.78328
## On.board.Service 37.839603 27.45282 38.10920
## Seat.Comfort 40.442853 31.39346 43.28877
## Leg.Room.Service 35.291453 29.63813 36.71174
## Cleanliness 36.140833 29.42201 39.34353
## Food.and.Drink 27.110184 28.35780 31.05048
## In.flight.Service 41.071863 30.00064 42.94016
## In.flight.Wifi.Service 62.300475 37.53590 54.09010
## In.flight.Entertainment 27.756679 31.98421 32.44693
## Baggage.Handling 45.657587 33.13015 50.29197
## Age_Range 15.280094 36.38939 37.10216
## Distance_Group 25.817948 27.50401 31.82476
## departure.delay.status 9.143532 12.44843 14.56679
## arrival.delay.status 5.155945 11.96756 12.01033
## delay.status 9.702037 17.24588 17.13521
## In.flight.service 40.483718 17.20067 37.38414
## Flight.Entertainment 39.524720 25.54681 40.15111
## Pre.flight.service 44.346532 21.86557 42.70714
## Comfortability 39.887094 21.00721 37.27586
## MeanDecreaseGini
## Gender 162.69720
## Customer.Type 1201.00100
## Type.of.Travel 3557.53566
## Class 3345.72971
## Departure.Delay 261.31078
## Arrival.Delay 287.50981
## Departure.and.Arrival.Time.Convenience 1128.51367
## Ease.of.Online.Booking 1561.34330
## Check.in.Service 773.12598
## Online.Boarding 4800.12563
## Gate.Location 886.67192
## On.board.Service 1095.61398
## Seat.Comfort 1447.12480
## Leg.Room.Service 1405.00187
## Cleanliness 889.95546
## Food.and.Drink 590.21231
## In.flight.Service 951.86864
## In.flight.Wifi.Service 4598.24535
## In.flight.Entertainment 1796.83643
## Baggage.Handling 995.76089
## Age_Range 734.84088
## Distance_Group 506.36099
## departure.delay.status 44.40768
## arrival.delay.status 49.01447
## delay.status 61.85161
## In.flight.service 1359.11344
## Flight.Entertainment 2547.86759
## Pre.flight.service 2265.58146
## Comfortability 2107.45147
Plot the importance
varImpPlot(model.forest)
Lorem Ipsum
High satisfaction on : - Type of travel : Business - Class : Business - Customer Type : Returning
Type of Travel : Business
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Type.of.Travel == "Business"), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Type.of.Travel))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Type of Travel Business")
## [1] "Confusion matrix for Random Forest Type of Travel Business"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 26908
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 10585 | 675 | 11260 |
## --------------------|-----------|-----------|-----------|
## 1 | 542 | 15106 | 15648 |
## --------------------|-----------|-----------|-----------|
## Column Total | 11127 | 15781 | 26908 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Class : Business
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Class == "Business"), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Class))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Class Business")
## [1] "Confusion matrix for Random Forest Class Business"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 18648
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 5283 | 418 | 5701 |
## --------------------|-----------|-----------|-----------|
## 1 | 198 | 12749 | 12947 |
## --------------------|-----------|-----------|-----------|
## Column Total | 5481 | 13167 | 18648 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Customer Type : Returning
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Customer.Type == "Returning"), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Customer.Type))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Customer Type Returning")
## [1] "Confusion matrix for Random Forest Customer Type Returning"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 31830
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 16200 | 359 | 16559 |
## --------------------|-----------|-----------|-----------|
## 1 | 830 | 14441 | 15271 |
## --------------------|-----------|-----------|-----------|
## Column Total | 17030 | 14800 | 31830 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Low Satisfaction on : - Type of Travel Personal - Class Economy and Economy Plus - Customer Type First Time
Type of travel personal
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Type.of.Travel == "Personal"), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Type.of.Travel))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Type of Travel Personal")
## [1] "Confusion matrix for Random Forest Type of Travel Personal"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 12057
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 10865 | 2 | 10867 |
## --------------------|-----------|-----------|-----------|
## 1 | 475 | 715 | 1190 |
## --------------------|-----------|-----------|-----------|
## Column Total | 11340 | 717 | 12057 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Class Economy and Economy Plus
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Class %in% c("Economy", "Economy Plus")), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Class))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Type of Class Economy and Economy Plus")
## [1] "Confusion matrix for Random Forest Type of Class Economy and Economy Plus"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 20316
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 16088 | 308 | 16396 |
## --------------------|-----------|-----------|-----------|
## 1 | 892 | 3028 | 3920 |
## --------------------|-----------|-----------|-----------|
## Column Total | 16980 | 3336 | 20316 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Customer Type First Time
data_randfor <- data_clsf_ori[ which(data_clsf_ori$Customer.Type %in% "First-time"), ]
data_randfor <- data_randfor %>% select(-c(Arrival.Delay.Duration, Departure.Delay.Duration,
Flight.Distance, Age, Customer.Type))
proportion <- 0.7
split <- initial_split(data_randfor, prop = proportion)
training <- training(split)
testing <- testing(split)
model.forest <- randomForest::randomForest(Satisfaction ~.,
data = training,
ntree = 500, # how many trees should be grown?
mtry = 2, # how many variables to sample at each split?
replace = TRUE,
importance = TRUE) # sampling of cases with or without replacement?
print("Confusion matrix for Random Forest Type of Customer Type First Time")
## [1] "Confusion matrix for Random Forest Type of Customer Type First Time"
pred.test <- predict(model.forest, testing)
CrossTable(testing$Satisfaction, pred.test,
prop.chisq = FALSE,
prop.c = FALSE,
prop.r = FALSE,
prop.t = FALSE,
dnn = c("Actual Satisfaction", "Predicted Satisfaction"))
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 7134
##
##
## | Predicted Satisfaction
## Actual Satisfaction | 0 | 1 | Row Total |
## --------------------|-----------|-----------|-----------|
## 0 | 5226 | 226 | 5452 |
## --------------------|-----------|-----------|-----------|
## 1 | 244 | 1438 | 1682 |
## --------------------|-----------|-----------|-----------|
## Column Total | 5470 | 1664 | 7134 |
## --------------------|-----------|-----------|-----------|
##
##
varImpPlot(model.forest)
Lorem Ipsum
Why Logistic Regression?
# Split training and testing data
set.seed(46748756)
library (rsample)
# Create copy of dataset for the purpose of classification only, convert to factor
data_logreg <- data %>%
mutate(across(-c(ID, Age, Flight.Distance, Departure.Delay, Arrival.Delay,
Departure.Delay.Duration,
Arrival.Delay.Duration, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z, Flight.Distance.log,
Flight.Distance.log.z, Age.log, Age.log.z), as.factor)) %>% subset(
select = -c(ID, Departure.Delay.Duration,
Arrival.Delay.Duration, Arrival.Delay.Duration.log, Arrival.Delay.Duration.log.z,
Departure.Delay.Duration.log, Departure.Delay.Duration.log.z)
)
# Cleaning column from zero score that only exist in small quantity (most of them only have 1 data, cant be splitted to training and testing)
cols_to_clean <- c("On.board.Service", "Check.in.Service", "Seat.Comfort",
"In.flight.Service", "Gate.Location", "Cleanliness",
"In.flight.Entertainment")
for (col in cols_to_clean) {
data_logreg <- subset(data_logreg, data_logreg[[col]] != "0")
data_logreg[[col]] <- droplevels(data_logreg[[col]])
}
proportion <- 0.7
split <- initial_split(data_logreg, prop = proportion)
training <- training(split)
testing <- testing(split)
Lorem Ipsum
# Using all data
model_log <- glm(Satisfaction ~.,
data = training, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_log)
##
## Call:
## glm(formula = Satisfaction ~ ., family = "binomial", data = training)
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.777e+00 6.527e+03 0.000 0.999661
## GenderMale 6.166e-02 3.084e-02 1.999 0.045572
## Age -3.875e-02 8.204e-03 -4.723 2.32e-06
## Customer.TypeReturning 4.032e+00 5.869e-02 68.708 < 2e-16
## Type.of.TravelPersonal -4.932e+00 6.496e-02 -75.920 < 2e-16
## ClassEconomy -4.940e-01 4.151e-02 -11.903 < 2e-16
## ClassEconomy Plus -6.143e-01 6.612e-02 -9.290 < 2e-16
## Flight.Distance -5.745e-06 7.389e-05 -0.078 0.938024
## Departure.Delay -1.758e-04 4.588e-03 -0.038 0.969442
## Arrival.Delay -4.002e-02 4.519e-03 -8.856 < 2e-16
## Departure.and.Arrival.Time.Convenience1 4.328e-01 1.017e-01 4.256 2.08e-05
## Departure.and.Arrival.Time.Convenience2 5.734e-01 9.775e-02 5.866 4.46e-09
## Departure.and.Arrival.Time.Convenience3 4.066e-01 9.420e-02 4.317 1.58e-05
## Departure.and.Arrival.Time.Convenience4 -5.168e-01 8.399e-02 -6.153 7.60e-10
## Departure.and.Arrival.Time.Convenience5 -6.708e-01 9.149e-02 -7.332 2.26e-13
## Ease.of.Online.Booking1 3.378e+00 8.984e-01 3.760 0.000170
## Ease.of.Online.Booking2 3.428e+00 9.004e-01 3.807 0.000141
## Ease.of.Online.Booking3 4.002e+00 9.037e-01 4.428 9.52e-06
## Ease.of.Online.Booking4 4.742e+00 9.091e-01 5.216 1.83e-07
## Ease.of.Online.Booking5 4.225e+00 9.160e-01 4.612 3.99e-06
## Check.in.Service2 2.348e-02 7.273e-02 0.323 0.746801
## Check.in.Service3 5.088e-01 1.002e-01 5.075 3.87e-07
## Check.in.Service4 3.754e-01 1.391e-01 2.699 0.006954
## Check.in.Service5 1.346e+00 1.803e-01 7.464 8.39e-14
## Online.Boarding1 -3.561e+00 9.080e-01 -3.922 8.79e-05
## Online.Boarding2 -3.692e+00 9.075e-01 -4.068 4.73e-05
## Online.Boarding3 -4.120e+00 9.086e-01 -4.534 5.78e-06
## Online.Boarding4 -2.636e+00 9.111e-01 -2.893 0.003815
## Online.Boarding5 -1.394e+00 9.157e-01 -1.522 0.128014
## Gate.Location2 4.532e-02 6.609e-02 0.686 0.492905
## Gate.Location3 -2.009e-01 6.109e-02 -3.289 0.001005
## Gate.Location4 -4.086e-01 6.317e-02 -6.468 9.92e-11
## Gate.Location5 -7.222e-01 8.194e-02 -8.814 < 2e-16
## On.board.Service2 1.214e+00 5.017e+02 0.002 0.998070
## On.board.Service3 2.965e+00 1.003e+03 0.003 0.997642
## On.board.Service4 4.235e+00 1.505e+03 0.003 0.997755
## On.board.Service5 6.091e+00 2.007e+03 0.003 0.997578
## Seat.Comfort2 -5.290e-01 8.346e-02 -6.339 2.31e-10
## Seat.Comfort3 -1.701e+00 8.555e-02 -19.882 < 2e-16
## Seat.Comfort4 -1.309e+00 9.551e-02 -13.707 < 2e-16
## Seat.Comfort5 -6.945e-01 1.104e-01 -6.290 3.18e-10
## Leg.Room.Service1 -2.268e+00 9.660e-01 -2.348 0.018889
## Leg.Room.Service2 -2.301e+00 9.663e-01 -2.382 0.017240
## Leg.Room.Service3 -2.765e+00 9.672e-01 -2.859 0.004254
## Leg.Room.Service4 -2.277e+00 9.686e-01 -2.351 0.018747
## Leg.Room.Service5 -2.112e+00 9.698e-01 -2.178 0.029410
## Cleanliness2 -1.382e-01 8.383e-02 -1.649 0.099162
## Cleanliness3 2.993e-02 8.680e-02 0.345 0.730229
## Cleanliness4 -2.137e-01 9.718e-02 -2.199 0.027888
## Cleanliness5 3.810e-01 1.163e-01 3.275 0.001055
## Food.and.Drink1 -8.659e-01 5.017e+02 -0.002 0.998623
## Food.and.Drink2 6.922e-01 1.003e+03 0.001 0.999450
## Food.and.Drink3 1.830e+00 1.505e+03 0.001 0.999030
## Food.and.Drink4 3.243e+00 2.007e+03 0.002 0.998711
## Food.and.Drink5 4.343e+00 2.509e+03 0.002 0.998619
## In.flight.Service2 7.689e-01 5.017e+02 0.002 0.998777
## In.flight.Service3 1.363e+00 1.003e+03 0.001 0.998917
## In.flight.Service4 3.262e+00 1.505e+03 0.002 0.998271
## In.flight.Service5 5.297e+00 2.007e+03 0.003 0.997894
## In.flight.Wifi.Service1 -2.784e+01 8.926e+01 -0.312 0.755105
## In.flight.Wifi.Service2 -2.922e+01 9.204e+01 -0.317 0.750904
## In.flight.Wifi.Service3 -2.982e+01 1.033e+02 -0.289 0.772789
## In.flight.Wifi.Service4 -2.812e+01 1.207e+02 -0.233 0.815752
## In.flight.Wifi.Service5 -1.917e+01 1.419e+02 -0.135 0.892550
## In.flight.Entertainment2 -1.696e-01 2.909e+01 -0.006 0.995348
## In.flight.Entertainment3 -3.219e-01 5.819e+01 -0.006 0.995586
## In.flight.Entertainment4 -7.625e-01 8.728e+01 -0.009 0.993030
## In.flight.Entertainment5 -1.568e+00 1.164e+02 -0.013 0.989250
## Baggage.Handling2 -3.733e-01 8.441e-02 -4.423 9.74e-06
## Baggage.Handling3 -1.047e+00 7.934e-02 -13.195 < 2e-16
## Baggage.Handling4 -4.217e-01 7.681e-02 -5.490 4.03e-08
## Baggage.Handling5 2.465e-01 8.126e-02 3.034 0.002412
## Age_RangeChildren 8.444e-01 2.012e-01 4.197 2.71e-05
## Age_RangeSenior -4.023e-01 6.746e-02 -5.963 2.48e-09
## Age_RangeYouth 5.662e-01 7.494e-02 7.555 4.18e-14
## Distance_GroupMedium-haul 1.742e-01 9.225e-02 1.889 0.058930
## Distance_GroupShort-haul 3.827e-02 1.246e-01 0.307 0.758844
## departure.delay.statusOn Time -3.114e-02 1.432e-01 -0.217 0.827877
## arrival.delay.statusOn Time -4.102e-01 1.334e-01 -3.075 0.002104
## delay.statusOn-time 1.246e-01 1.330e-01 0.937 0.348942
## In.flight.service3 1.419e+01 6.021e+03 0.002 0.998120
## In.flight.service4 1.374e+01 5.519e+03 0.002 0.998014
## In.flight.service5 1.304e+01 5.017e+03 0.003 0.997926
## In.flight.service6 1.185e+01 4.516e+03 0.003 0.997907
## In.flight.service7 1.064e+01 4.014e+03 0.003 0.997886
## In.flight.service8 9.366e+00 3.512e+03 0.003 0.997872
## In.flight.service9 8.122e+00 3.010e+03 0.003 0.997847
## In.flight.service10 6.903e+00 2.509e+03 0.003 0.997805
## In.flight.service11 5.745e+00 2.007e+03 0.003 0.997716
## In.flight.service12 4.426e+00 1.505e+03 0.003 0.997654
## In.flight.service13 3.179e+00 1.003e+03 0.003 0.997472
## In.flight.service14 1.552e+00 5.017e+02 0.003 0.997532
## In.flight.service15 NA NA NA NA
## Flight.Entertainment2 -5.196e-01 2.327e+02 -0.002 0.998219
## Flight.Entertainment3 3.938e+00 2.037e+02 0.019 0.984573
## Flight.Entertainment4 4.065e+00 1.746e+02 0.023 0.981420
## Flight.Entertainment5 5.748e+00 1.455e+02 0.040 0.968483
## Flight.Entertainment6 6.006e+00 1.164e+02 0.052 0.958836
## Flight.Entertainment7 6.856e+00 8.728e+01 0.079 0.937385
## Flight.Entertainment8 5.703e+00 5.819e+01 0.098 0.921925
## Flight.Entertainment9 6.745e+00 2.910e+01 0.232 0.816693
## Flight.Entertainment10 NA NA NA NA
## Pre.flight.service2 4.683e-01 7.425e-01 0.631 0.528198
## Pre.flight.service3 1.416e-01 5.252e-01 0.270 0.787483
## Pre.flight.service4 2.737e-01 4.690e-01 0.584 0.559528
## Pre.flight.service5 2.271e-01 4.249e-01 0.534 0.593018
## Pre.flight.service6 3.109e-01 3.876e-01 0.802 0.422472
## Pre.flight.service7 1.884e-01 3.488e-01 0.540 0.589112
## Pre.flight.service8 4.232e-01 3.142e-01 1.347 0.178074
## Pre.flight.service9 5.131e-01 2.795e-01 1.835 0.066433
## Pre.flight.service10 7.408e-01 2.509e-01 2.952 0.003156
## Pre.flight.service11 8.179e-01 2.248e-01 3.638 0.000274
## Pre.flight.service12 8.412e-01 2.076e-01 4.051 5.09e-05
## Pre.flight.service13 2.574e-01 1.927e-01 1.336 0.181640
## Pre.flight.service14 4.242e-01 2.118e-01 2.003 0.045224
## Pre.flight.service15 NA NA NA NA
## Comfortability4 -6.245e-01 2.430e-01 -2.570 0.010184
## Comfortability5 -5.737e-01 2.025e-01 -2.832 0.004622
## Comfortability6 -7.568e-01 1.851e-01 -4.090 4.32e-05
## Comfortability7 -5.319e-01 1.630e-01 -3.263 0.001103
## Comfortability8 -4.695e-01 1.536e-01 -3.057 0.002239
## Comfortability9 -1.848e-01 1.351e-01 -1.368 0.171231
## Comfortability10 1.096e-01 1.292e-01 0.848 0.396411
## Comfortability11 5.521e-01 1.159e-01 4.764 1.90e-06
## Comfortability12 7.164e-01 1.134e-01 6.319 2.64e-10
## Comfortability13 8.309e-01 1.070e-01 7.768 7.99e-15
## Comfortability14 4.317e-01 1.219e-01 3.542 0.000397
## Comfortability15 NA NA NA NA
## Flight.Distance.log -3.111e-03 5.159e-02 -0.060 0.951919
## Age.log 1.909e+00 3.432e-01 5.562 2.67e-08
## Flight.Distance.log.z NA NA NA NA
## Age.log.z NA NA NA NA
##
## (Intercept)
## GenderMale *
## Age ***
## Customer.TypeReturning ***
## Type.of.TravelPersonal ***
## ClassEconomy ***
## ClassEconomy Plus ***
## Flight.Distance
## Departure.Delay
## Arrival.Delay ***
## Departure.and.Arrival.Time.Convenience1 ***
## Departure.and.Arrival.Time.Convenience2 ***
## Departure.and.Arrival.Time.Convenience3 ***
## Departure.and.Arrival.Time.Convenience4 ***
## Departure.and.Arrival.Time.Convenience5 ***
## Ease.of.Online.Booking1 ***
## Ease.of.Online.Booking2 ***
## Ease.of.Online.Booking3 ***
## Ease.of.Online.Booking4 ***
## Ease.of.Online.Booking5 ***
## Check.in.Service2
## Check.in.Service3 ***
## Check.in.Service4 **
## Check.in.Service5 ***
## Online.Boarding1 ***
## Online.Boarding2 ***
## Online.Boarding3 ***
## Online.Boarding4 **
## Online.Boarding5
## Gate.Location2
## Gate.Location3 **
## Gate.Location4 ***
## Gate.Location5 ***
## On.board.Service2
## On.board.Service3
## On.board.Service4
## On.board.Service5
## Seat.Comfort2 ***
## Seat.Comfort3 ***
## Seat.Comfort4 ***
## Seat.Comfort5 ***
## Leg.Room.Service1 *
## Leg.Room.Service2 *
## Leg.Room.Service3 **
## Leg.Room.Service4 *
## Leg.Room.Service5 *
## Cleanliness2 .
## Cleanliness3
## Cleanliness4 *
## Cleanliness5 **
## Food.and.Drink1
## Food.and.Drink2
## Food.and.Drink3
## Food.and.Drink4
## Food.and.Drink5
## In.flight.Service2
## In.flight.Service3
## In.flight.Service4
## In.flight.Service5
## In.flight.Wifi.Service1
## In.flight.Wifi.Service2
## In.flight.Wifi.Service3
## In.flight.Wifi.Service4
## In.flight.Wifi.Service5
## In.flight.Entertainment2
## In.flight.Entertainment3
## In.flight.Entertainment4
## In.flight.Entertainment5
## Baggage.Handling2 ***
## Baggage.Handling3 ***
## Baggage.Handling4 ***
## Baggage.Handling5 **
## Age_RangeChildren ***
## Age_RangeSenior ***
## Age_RangeYouth ***
## Distance_GroupMedium-haul .
## Distance_GroupShort-haul
## departure.delay.statusOn Time
## arrival.delay.statusOn Time **
## delay.statusOn-time
## In.flight.service3
## In.flight.service4
## In.flight.service5
## In.flight.service6
## In.flight.service7
## In.flight.service8
## In.flight.service9
## In.flight.service10
## In.flight.service11
## In.flight.service12
## In.flight.service13
## In.flight.service14
## In.flight.service15
## Flight.Entertainment2
## Flight.Entertainment3
## Flight.Entertainment4
## Flight.Entertainment5
## Flight.Entertainment6
## Flight.Entertainment7
## Flight.Entertainment8
## Flight.Entertainment9
## Flight.Entertainment10
## Pre.flight.service2
## Pre.flight.service3
## Pre.flight.service4
## Pre.flight.service5
## Pre.flight.service6
## Pre.flight.service7
## Pre.flight.service8
## Pre.flight.service9 .
## Pre.flight.service10 **
## Pre.flight.service11 ***
## Pre.flight.service12 ***
## Pre.flight.service13
## Pre.flight.service14 *
## Pre.flight.service15
## Comfortability4 *
## Comfortability5 **
## Comfortability6 ***
## Comfortability7 **
## Comfortability8 **
## Comfortability9
## Comfortability10
## Comfortability11 ***
## Comfortability12 ***
## Comfortability13 ***
## Comfortability14 ***
## Comfortability15
## Flight.Distance.log
## Age.log ***
## Flight.Distance.log.z
## Age.log.z
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 124461 on 90901 degrees of freedom
## Residual deviance: 29296 on 90776 degrees of freedom
## AIC: 29548
##
## Number of Fisher Scoring iterations: 17
Lorem Ipsum
Lorem Ipsum
Interpret the coefficient
exp(coef(model_log))
## (Intercept) GenderMale
## 1.606734e+01 1.063596e+00
## Age Customer.TypeReturning
## 9.619913e-01 5.639014e+01
## Type.of.TravelPersonal ClassEconomy
## 7.214896e-03 6.101599e-01
## ClassEconomy Plus Flight.Distance
## 5.410279e-01 9.999943e-01
## Departure.Delay Arrival.Delay
## 9.998242e-01 9.607716e-01
## Departure.and.Arrival.Time.Convenience1 Departure.and.Arrival.Time.Convenience2
## 1.541526e+00 1.774291e+00
## Departure.and.Arrival.Time.Convenience3 Departure.and.Arrival.Time.Convenience4
## 1.501776e+00 5.964440e-01
## Departure.and.Arrival.Time.Convenience5 Ease.of.Online.Booking1
## 5.113032e-01 2.931603e+01
## Ease.of.Online.Booking2 Ease.of.Online.Booking3
## 3.080414e+01 5.468144e+01
## Ease.of.Online.Booking4 Ease.of.Online.Booking5
## 1.146149e+02 6.834061e+01
## Check.in.Service2 Check.in.Service3
## 1.023760e+00 1.663266e+00
## Check.in.Service4 Check.in.Service5
## 1.455645e+00 3.841455e+00
## Online.Boarding1 Online.Boarding2
## 2.841278e-02 2.491920e-02
## Online.Boarding3 Online.Boarding4
## 1.624730e-02 7.166094e-02
## Online.Boarding5 Gate.Location2
## 2.481480e-01 1.046363e+00
## Gate.Location3 Gate.Location4
## 8.179761e-01 6.645745e-01
## Gate.Location5 On.board.Service2
## 4.856807e-01 3.365663e+00
## On.board.Service3 On.board.Service4
## 1.939865e+01 6.908227e+01
## On.board.Service5 Seat.Comfort2
## 4.419085e+02 5.891685e-01
## Seat.Comfort3 Seat.Comfort4
## 1.825227e-01 2.700321e-01
## Seat.Comfort5 Leg.Room.Service1
## 4.993174e-01 1.035280e-01
## Leg.Room.Service2 Leg.Room.Service3
## 1.001400e-01 6.298349e-02
## Leg.Room.Service4 Leg.Room.Service5
## 1.026248e-01 1.209661e-01
## Cleanliness2 Cleanliness3
## 8.708996e-01 1.030383e+00
## Cleanliness4 Cleanliness5
## 8.075948e-01 1.463767e+00
## Food.and.Drink1 Food.and.Drink2
## 4.206813e-01 1.998054e+00
## Food.and.Drink3 Food.and.Drink4
## 6.234229e+00 2.559867e+01
## Food.and.Drink5 In.flight.Service2
## 7.692282e+01 2.157482e+00
## In.flight.Service3 In.flight.Service4
## 3.906474e+00 2.610313e+01
## In.flight.Service5 In.flight.Wifi.Service1
## 1.996861e+02 8.098210e-13
## In.flight.Wifi.Service2 In.flight.Wifi.Service3
## 2.042559e-13 1.115767e-13
## In.flight.Wifi.Service4 In.flight.Wifi.Service5
## 6.154799e-13 4.721111e-09
## In.flight.Entertainment2 In.flight.Entertainment3
## 8.439913e-01 7.247804e-01
## In.flight.Entertainment4 In.flight.Entertainment5
## 4.664997e-01 2.084827e-01
## Baggage.Handling2 Baggage.Handling3
## 6.884326e-01 3.510340e-01
## Baggage.Handling4 Baggage.Handling5
## 6.559623e-01 1.279593e+00
## Age_RangeChildren Age_RangeSenior
## 2.326476e+00 6.687971e-01
## Age_RangeYouth Distance_GroupMedium-haul
## 1.761509e+00 1.190338e+00
## Distance_GroupShort-haul departure.delay.statusOn Time
## 1.039007e+00 9.693412e-01
## arrival.delay.statusOn Time delay.statusOn-time
## 6.634961e-01 1.132657e+00
## In.flight.service3 In.flight.service4
## 1.448637e+06 9.227612e+05
## In.flight.service5 In.flight.service6
## 4.619452e+05 1.398340e+05
## In.flight.service7 In.flight.service8
## 4.157187e+04 1.167990e+04
## In.flight.service9 In.flight.service10
## 3.366921e+03 9.953502e+02
## In.flight.service11 In.flight.service12
## 3.127305e+02 8.361719e+01
## In.flight.service13 In.flight.service14
## 2.402339e+01 4.720917e+00
## In.flight.service15 Flight.Entertainment2
## NA 5.947767e-01
## Flight.Entertainment3 Flight.Entertainment4
## 5.130783e+01 5.827607e+01
## Flight.Entertainment5 Flight.Entertainment6
## 3.134282e+02 4.060387e+02
## Flight.Entertainment7 Flight.Entertainment8
## 9.500134e+02 2.997443e+02
## Flight.Entertainment9 Flight.Entertainment10
## 8.495626e+02 NA
## Pre.flight.service2 Pre.flight.service3
## 1.597293e+00 1.152099e+00
## Pre.flight.service4 Pre.flight.service5
## 1.314784e+00 1.254975e+00
## Pre.flight.service6 Pre.flight.service7
## 1.364702e+00 1.207291e+00
## Pre.flight.service8 Pre.flight.service9
## 1.526785e+00 1.670428e+00
## Pre.flight.service10 Pre.flight.service11
## 2.097569e+00 2.265752e+00
## Pre.flight.service12 Pre.flight.service13
## 2.319168e+00 1.293577e+00
## Pre.flight.service14 Pre.flight.service15
## 1.528342e+00 NA
## Comfortability4 Comfortability5
## 5.355398e-01 5.634598e-01
## Comfortability6 Comfortability7
## 4.691541e-01 5.874722e-01
## Comfortability8 Comfortability9
## 6.253452e-01 8.312700e-01
## Comfortability10 Comfortability11
## 1.115816e+00 1.736966e+00
## Comfortability12 Comfortability13
## 2.047080e+00 2.295493e+00
## Comfortability14 Comfortability15
## 1.539929e+00 NA
## Flight.Distance.log Age.log
## 9.968942e-01 6.744272e+00
## Flight.Distance.log.z Age.log.z
## NA NA
Lorem Ipsum
Visualize the coefficient
coefs <- summary(model_log)$coefficients
coef_df <- data.frame(
Variable = rownames(coefs),
Estimate = coefs[, "Estimate"],
StdError = coefs[, "Std. Error"],
p_value = coefs[, "Pr(>|z|)"]
)
# Remove intercept for clarity
coef_df <- coef_df[coef_df$Variable != "(Intercept)", ]
# Plot using ggplot2
library(ggplot2)
ggplot(coef_df, aes(x = reorder(Variable, Estimate), y = Estimate)) +
geom_point(color = "darkblue") +
geom_errorbar(aes(ymin = Estimate - 1.96 * StdError,
ymax = Estimate + 1.96 * StdError), width = 0.2) +
coord_flip() +
labs(title = "Logistic Regression Coefficients",
x = "Predictor",
y = "Coefficient Estimate (log-odds)") +
theme_minimal()
Lorem Ipsum
Interpreting Z Value
model_summary <- summary(model_log)
# Extract z-values and convert to data frame
z_df <- data.frame(
Variable = rownames(model_summary$coefficients),
Z_value = model_summary$coefficients[, "z value"]
)
# Remove intercept
z_df <- subset(z_df, Variable != "(Intercept)")
z_df
## Variable
## GenderMale GenderMale
## Age Age
## Customer.TypeReturning Customer.TypeReturning
## Type.of.TravelPersonal Type.of.TravelPersonal
## ClassEconomy ClassEconomy
## ClassEconomy Plus ClassEconomy Plus
## Flight.Distance Flight.Distance
## Departure.Delay Departure.Delay
## Arrival.Delay Arrival.Delay
## Departure.and.Arrival.Time.Convenience1 Departure.and.Arrival.Time.Convenience1
## Departure.and.Arrival.Time.Convenience2 Departure.and.Arrival.Time.Convenience2
## Departure.and.Arrival.Time.Convenience3 Departure.and.Arrival.Time.Convenience3
## Departure.and.Arrival.Time.Convenience4 Departure.and.Arrival.Time.Convenience4
## Departure.and.Arrival.Time.Convenience5 Departure.and.Arrival.Time.Convenience5
## Ease.of.Online.Booking1 Ease.of.Online.Booking1
## Ease.of.Online.Booking2 Ease.of.Online.Booking2
## Ease.of.Online.Booking3 Ease.of.Online.Booking3
## Ease.of.Online.Booking4 Ease.of.Online.Booking4
## Ease.of.Online.Booking5 Ease.of.Online.Booking5
## Check.in.Service2 Check.in.Service2
## Check.in.Service3 Check.in.Service3
## Check.in.Service4 Check.in.Service4
## Check.in.Service5 Check.in.Service5
## Online.Boarding1 Online.Boarding1
## Online.Boarding2 Online.Boarding2
## Online.Boarding3 Online.Boarding3
## Online.Boarding4 Online.Boarding4
## Online.Boarding5 Online.Boarding5
## Gate.Location2 Gate.Location2
## Gate.Location3 Gate.Location3
## Gate.Location4 Gate.Location4
## Gate.Location5 Gate.Location5
## On.board.Service2 On.board.Service2
## On.board.Service3 On.board.Service3
## On.board.Service4 On.board.Service4
## On.board.Service5 On.board.Service5
## Seat.Comfort2 Seat.Comfort2
## Seat.Comfort3 Seat.Comfort3
## Seat.Comfort4 Seat.Comfort4
## Seat.Comfort5 Seat.Comfort5
## Leg.Room.Service1 Leg.Room.Service1
## Leg.Room.Service2 Leg.Room.Service2
## Leg.Room.Service3 Leg.Room.Service3
## Leg.Room.Service4 Leg.Room.Service4
## Leg.Room.Service5 Leg.Room.Service5
## Cleanliness2 Cleanliness2
## Cleanliness3 Cleanliness3
## Cleanliness4 Cleanliness4
## Cleanliness5 Cleanliness5
## Food.and.Drink1 Food.and.Drink1
## Food.and.Drink2 Food.and.Drink2
## Food.and.Drink3 Food.and.Drink3
## Food.and.Drink4 Food.and.Drink4
## Food.and.Drink5 Food.and.Drink5
## In.flight.Service2 In.flight.Service2
## In.flight.Service3 In.flight.Service3
## In.flight.Service4 In.flight.Service4
## In.flight.Service5 In.flight.Service5
## In.flight.Wifi.Service1 In.flight.Wifi.Service1
## In.flight.Wifi.Service2 In.flight.Wifi.Service2
## In.flight.Wifi.Service3 In.flight.Wifi.Service3
## In.flight.Wifi.Service4 In.flight.Wifi.Service4
## In.flight.Wifi.Service5 In.flight.Wifi.Service5
## In.flight.Entertainment2 In.flight.Entertainment2
## In.flight.Entertainment3 In.flight.Entertainment3
## In.flight.Entertainment4 In.flight.Entertainment4
## In.flight.Entertainment5 In.flight.Entertainment5
## Baggage.Handling2 Baggage.Handling2
## Baggage.Handling3 Baggage.Handling3
## Baggage.Handling4 Baggage.Handling4
## Baggage.Handling5 Baggage.Handling5
## Age_RangeChildren Age_RangeChildren
## Age_RangeSenior Age_RangeSenior
## Age_RangeYouth Age_RangeYouth
## Distance_GroupMedium-haul Distance_GroupMedium-haul
## Distance_GroupShort-haul Distance_GroupShort-haul
## departure.delay.statusOn Time departure.delay.statusOn Time
## arrival.delay.statusOn Time arrival.delay.statusOn Time
## delay.statusOn-time delay.statusOn-time
## In.flight.service3 In.flight.service3
## In.flight.service4 In.flight.service4
## In.flight.service5 In.flight.service5
## In.flight.service6 In.flight.service6
## In.flight.service7 In.flight.service7
## In.flight.service8 In.flight.service8
## In.flight.service9 In.flight.service9
## In.flight.service10 In.flight.service10
## In.flight.service11 In.flight.service11
## In.flight.service12 In.flight.service12
## In.flight.service13 In.flight.service13
## In.flight.service14 In.flight.service14
## Flight.Entertainment2 Flight.Entertainment2
## Flight.Entertainment3 Flight.Entertainment3
## Flight.Entertainment4 Flight.Entertainment4
## Flight.Entertainment5 Flight.Entertainment5
## Flight.Entertainment6 Flight.Entertainment6
## Flight.Entertainment7 Flight.Entertainment7
## Flight.Entertainment8 Flight.Entertainment8
## Flight.Entertainment9 Flight.Entertainment9
## Pre.flight.service2 Pre.flight.service2
## Pre.flight.service3 Pre.flight.service3
## Pre.flight.service4 Pre.flight.service4
## Pre.flight.service5 Pre.flight.service5
## Pre.flight.service6 Pre.flight.service6
## Pre.flight.service7 Pre.flight.service7
## Pre.flight.service8 Pre.flight.service8
## Pre.flight.service9 Pre.flight.service9
## Pre.flight.service10 Pre.flight.service10
## Pre.flight.service11 Pre.flight.service11
## Pre.flight.service12 Pre.flight.service12
## Pre.flight.service13 Pre.flight.service13
## Pre.flight.service14 Pre.flight.service14
## Comfortability4 Comfortability4
## Comfortability5 Comfortability5
## Comfortability6 Comfortability6
## Comfortability7 Comfortability7
## Comfortability8 Comfortability8
## Comfortability9 Comfortability9
## Comfortability10 Comfortability10
## Comfortability11 Comfortability11
## Comfortability12 Comfortability12
## Comfortability13 Comfortability13
## Comfortability14 Comfortability14
## Flight.Distance.log Flight.Distance.log
## Age.log Age.log
## Z_value
## GenderMale 1.999334e+00
## Age -4.723295e+00
## Customer.TypeReturning 6.870771e+01
## Type.of.TravelPersonal -7.591977e+01
## ClassEconomy -1.190258e+01
## ClassEconomy Plus -9.290346e+00
## Flight.Distance -7.775394e-02
## Departure.Delay -3.830762e-02
## Arrival.Delay -8.856472e+00
## Departure.and.Arrival.Time.Convenience1 4.255833e+00
## Departure.and.Arrival.Time.Convenience2 5.866094e+00
## Departure.and.Arrival.Time.Convenience3 4.316990e+00
## Departure.and.Arrival.Time.Convenience4 -6.152979e+00
## Departure.and.Arrival.Time.Convenience5 -7.332235e+00
## Ease.of.Online.Booking1 3.760253e+00
## Ease.of.Online.Booking2 3.806809e+00
## Ease.of.Online.Booking3 4.427810e+00
## Ease.of.Online.Booking4 5.215638e+00
## Ease.of.Online.Booking5 4.611790e+00
## Check.in.Service2 3.228607e-01
## Check.in.Service3 5.075217e+00
## Check.in.Service4 2.699058e+00
## Check.in.Service5 7.464104e+00
## Online.Boarding1 -3.921765e+00
## Online.Boarding2 -4.068456e+00
## Online.Boarding3 -4.534292e+00
## Online.Boarding4 -2.893096e+00
## Online.Boarding5 -1.521980e+00
## Gate.Location2 6.856953e-01
## Gate.Location3 -3.289047e+00
## Gate.Location4 -6.468111e+00
## Gate.Location5 -8.813784e+00
## On.board.Service2 2.418822e-03
## On.board.Service3 2.954908e-03
## On.board.Service4 2.813729e-03
## On.board.Service5 3.034978e-03
## Seat.Comfort2 -6.339089e+00
## Seat.Comfort3 -1.988227e+01
## Seat.Comfort4 -1.370747e+01
## Seat.Comfort5 -6.289547e+00
## Leg.Room.Service1 -2.347720e+00
## Leg.Room.Service2 -2.381542e+00
## Leg.Room.Service3 -2.858670e+00
## Leg.Room.Service4 -2.350528e+00
## Leg.Room.Service5 -2.177953e+00
## Cleanliness2 -1.648931e+00
## Cleanliness3 3.448210e-01
## Cleanliness4 -2.198863e+00
## Cleanliness5 3.275382e+00
## Food.and.Drink1 -1.725739e-03
## Food.and.Drink2 6.897699e-04
## Food.and.Drink3 1.215800e-03
## Food.and.Drink4 1.615641e-03
## Food.and.Drink5 1.731090e-03
## In.flight.Service2 1.532544e-03
## In.flight.Service3 1.357904e-03
## In.flight.Service4 2.167153e-03
## In.flight.Service5 2.639178e-03
## In.flight.Wifi.Service1 -3.119147e-01
## In.flight.Wifi.Service2 -3.174474e-01
## In.flight.Wifi.Service3 -2.887288e-01
## In.flight.Wifi.Service4 -2.330120e-01
## In.flight.Wifi.Service5 -1.350787e-01
## In.flight.Entertainment2 -5.829947e-03
## In.flight.Entertainment3 -5.531979e-03
## In.flight.Entertainment4 -8.736260e-03
## In.flight.Entertainment5 -1.347306e-02
## Baggage.Handling2 -4.422792e+00
## Baggage.Handling3 -1.319486e+01
## Baggage.Handling4 -5.489728e+00
## Baggage.Handling5 3.034167e+00
## Age_RangeChildren 4.196687e+00
## Age_RangeSenior -5.963020e+00
## Age_RangeYouth 7.555357e+00
## Distance_GroupMedium-haul 1.888718e+00
## Distance_GroupShort-haul 3.069992e-01
## departure.delay.statusOn Time -2.174254e-01
## arrival.delay.statusOn Time -3.075214e+00
## delay.statusOn-time 9.366444e-01
## In.flight.service3 2.356147e-03
## In.flight.service4 2.488625e-03
## In.flight.service5 2.599583e-03
## In.flight.service6 2.623794e-03
## In.flight.service7 2.649564e-03
## In.flight.service8 2.666602e-03
## In.flight.service9 2.697852e-03
## In.flight.service10 2.751651e-03
## In.flight.service11 2.862697e-03
## In.flight.service12 2.940588e-03
## In.flight.service13 3.167990e-03
## In.flight.service14 3.093229e-03
## Flight.Entertainment2 -2.232352e-03
## Flight.Entertainment3 1.933613e-02
## Flight.Entertainment4 2.328835e-02
## Flight.Entertainment5 3.951144e-02
## Flight.Entertainment6 5.161376e-02
## Flight.Entertainment7 7.855705e-02
## Flight.Entertainment8 9.800903e-02
## Flight.Entertainment9 2.318008e-01
## Pre.flight.service2 6.307592e-01
## Pre.flight.service3 2.695806e-01
## Pre.flight.service4 5.835432e-01
## Pre.flight.service5 5.344671e-01
## Pre.flight.service6 8.021408e-01
## Pre.flight.service7 5.401240e-01
## Pre.flight.service8 1.346707e+00
## Pre.flight.service9 1.835491e+00
## Pre.flight.service10 2.952112e+00
## Pre.flight.service11 3.638305e+00
## Pre.flight.service12 4.051440e+00
## Pre.flight.service13 1.335722e+00
## Pre.flight.service14 2.002566e+00
## Comfortability4 -2.569530e+00
## Comfortability5 -2.832253e+00
## Comfortability6 -4.089643e+00
## Comfortability7 -3.262921e+00
## Comfortability8 -3.056510e+00
## Comfortability9 -1.368259e+00
## Comfortability10 8.480484e-01
## Comfortability11 4.763554e+00
## Comfortability12 6.318782e+00
## Comfortability13 7.767700e+00
## Comfortability14 3.541785e+00
## Flight.Distance.log -6.029697e-02
## Age.log 5.561852e+00
# Plot using ggplot2
library(ggplot2)
ggplot(z_df, aes(x = reorder(Variable, abs(Z_value)), y = Z_value)) +
geom_col(fill = "steelblue") +
geom_hline(yintercept = c(-1.96, 1.96), linetype = "dashed", color = "red") + # 95% significance threshold
coord_flip() +
labs(title = "Z-values from Logistic Regression",
x = "Predictor Variable",
y = "Z-value (Wald Statistic)") +
theme_minimal()
Lorem Ipsum
pred.test <- predict(model_log, newdata = testing, type = "response")
# Convert to class labels
pred.test.result <- ifelse(pred.test > 0.5, "Satisfied", "Neutral or Dissatisfied")
testing$Satisfaction <- factor(testing$Satisfaction, levels = c("Satisfied", "Neutral or Dissatisfied"))
pred.test.result <- factor(pred.test.result, levels = c("Satisfied", "Neutral or Dissatisfied"))
confusionMatrix(pred.test.result, testing$Satisfaction)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Satisfied Neutral or Dissatisfied
## Satisfied 15534 1082
## Neutral or Dissatisfied 1380 20962
##
## Accuracy : 0.9368
## 95% CI : (0.9343, 0.9392)
## No Information Rate : 0.5658
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8711
##
## Mcnemar's Test P-Value : 2.155e-09
##
## Sensitivity : 0.9184
## Specificity : 0.9509
## Pos Pred Value : 0.9349
## Neg Pred Value : 0.9382
## Prevalence : 0.4342
## Detection Rate : 0.3987
## Detection Prevalence : 0.4265
## Balanced Accuracy : 0.9347
##
## 'Positive' Class : Satisfied
##
Accuracy 93.7% with sensitivity 95.2% and specificy 91.9%
Interpret ROC and AUC
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following object is masked from 'package:gmodels':
##
## ci
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
# ROC curve and AUC
roc_obj <- roc(testing$Satisfaction, pred.test)
## Setting levels: control = Satisfied, case = Neutral or Dissatisfied
## Setting direction: controls > cases
plot(roc_obj, col = "blue", main = "ROC Curve")
auc(roc_obj)
## Area under the curve: 0.9825
Lorem Ipsum
auc(roc_obj)
## Area under the curve: 0.9825
Lorem Ipsum
plot(model_log)
## Warning: not plotting observations with leverage one:
## 14720
Lorem Ipsum
Using significant variables only
p-value < 0.05
training_signif <- training %>% subset(select = -c(Departure.Delay, Arrival.Delay, Flight.Distance,
In.flight.Wifi.Service, Food.and.Drink,
In.flight.Entertainment, Age_Range, Distance_Group,
In.flight.service, Flight.Entertainment,
Comfortability, Pre.flight.service,
Flight.Distance.log.z, Age.log.z))
model_log_signif <- glm(Satisfaction ~.,
data = training_signif, family = "binomial")
summary(model_log_signif)
##
## Call:
## glm(formula = Satisfaction ~ ., family = "binomial", data = training_signif)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.726402 0.386841 -12.218 < 2e-16
## GenderMale 0.056066 0.024515 2.287 0.022197
## Age -0.026105 0.003329 -7.841 4.48e-15
## Customer.TypeReturning 2.706964 0.040587 66.696 < 2e-16
## Type.of.TravelPersonal -3.758393 0.045014 -83.494 < 2e-16
## ClassEconomy -0.430542 0.031084 -13.851 < 2e-16
## ClassEconomy Plus -0.582623 0.049657 -11.733 < 2e-16
## Departure.and.Arrival.Time.Convenience1 0.424447 0.076418 5.554 2.79e-08
## Departure.and.Arrival.Time.Convenience2 0.526151 0.074155 7.095 1.29e-12
## Departure.and.Arrival.Time.Convenience3 0.432261 0.071384 6.055 1.40e-09
## Departure.and.Arrival.Time.Convenience4 -0.444639 0.063216 -7.034 2.01e-12
## Departure.and.Arrival.Time.Convenience5 -0.530832 0.067259 -7.892 2.97e-15
## Ease.of.Online.Booking1 -3.553943 0.132761 -26.769 < 2e-16
## Ease.of.Online.Booking2 -3.887591 0.131261 -29.617 < 2e-16
## Ease.of.Online.Booking3 -3.584740 0.129080 -27.771 < 2e-16
## Ease.of.Online.Booking4 -2.207070 0.126337 -17.470 < 2e-16
## Ease.of.Online.Booking5 -1.470148 0.128007 -11.485 < 2e-16
## Check.in.Service2 0.092070 0.046345 1.987 0.046962
## Check.in.Service3 0.474283 0.041352 11.469 < 2e-16
## Check.in.Service4 0.414419 0.041258 10.045 < 2e-16
## Check.in.Service5 1.087686 0.047203 23.043 < 2e-16
## Online.Boarding1 -0.921454 0.136197 -6.766 1.33e-11
## Online.Boarding2 -1.003314 0.135997 -7.377 1.61e-13
## Online.Boarding3 -0.980285 0.134391 -7.294 3.00e-13
## Online.Boarding4 0.974438 0.133759 7.285 3.22e-13
## Online.Boarding5 2.630697 0.136610 19.257 < 2e-16
## Gate.Location2 0.168932 0.052704 3.205 0.001349
## Gate.Location3 -0.016261 0.048317 -0.337 0.736463
## Gate.Location4 -0.134138 0.049195 -2.727 0.006399
## Gate.Location5 -0.653459 0.061010 -10.711 < 2e-16
## On.board.Service2 0.140519 0.055273 2.542 0.011014
## On.board.Service3 0.772571 0.050090 15.424 < 2e-16
## On.board.Service4 0.967030 0.049755 19.436 < 2e-16
## On.board.Service5 1.245563 0.055245 22.546 < 2e-16
## Seat.Comfort2 -0.183780 0.058457 -3.144 0.001668
## Seat.Comfort3 -0.997303 0.054562 -18.278 < 2e-16
## Seat.Comfort4 -0.434206 0.054197 -8.012 1.13e-15
## Seat.Comfort5 0.027584 0.058606 0.471 0.637878
## Leg.Room.Service1 0.912231 0.188171 4.848 1.25e-06
## Leg.Room.Service2 1.132125 0.186907 6.057 1.39e-09
## Leg.Room.Service3 1.093448 0.186793 5.854 4.80e-09
## Leg.Room.Service4 1.937807 0.187013 10.362 < 2e-16
## Leg.Room.Service5 2.033967 0.187757 10.833 < 2e-16
## Cleanliness2 0.210623 0.054553 3.861 0.000113
## Cleanliness3 0.826451 0.050262 16.443 < 2e-16
## Cleanliness4 0.849126 0.050270 16.891 < 2e-16
## Cleanliness5 0.987434 0.056510 17.474 < 2e-16
## In.flight.Service2 -0.070701 0.065992 -1.071 0.284011
## In.flight.Service3 -0.508676 0.061512 -8.270 < 2e-16
## In.flight.Service4 0.115504 0.059806 1.931 0.053445
## In.flight.Service5 0.606356 0.063693 9.520 < 2e-16
## Baggage.Handling2 -0.097400 0.064499 -1.510 0.131022
## Baggage.Handling3 -0.418477 0.060326 -6.937 4.01e-12
## Baggage.Handling4 0.157687 0.059131 2.667 0.007659
## Baggage.Handling5 0.677045 0.063006 10.746 < 2e-16
## departure.delay.statusOn Time 0.063197 0.085602 0.738 0.460350
## arrival.delay.statusOn Time 0.273720 0.088764 3.084 0.002045
## delay.statusOn-time 0.262976 0.103781 2.534 0.011279
## Flight.Distance.log 0.058465 0.015167 3.855 0.000116
## Age.log 0.761836 0.116380 6.546 5.91e-11
##
## (Intercept) ***
## GenderMale *
## Age ***
## Customer.TypeReturning ***
## Type.of.TravelPersonal ***
## ClassEconomy ***
## ClassEconomy Plus ***
## Departure.and.Arrival.Time.Convenience1 ***
## Departure.and.Arrival.Time.Convenience2 ***
## Departure.and.Arrival.Time.Convenience3 ***
## Departure.and.Arrival.Time.Convenience4 ***
## Departure.and.Arrival.Time.Convenience5 ***
## Ease.of.Online.Booking1 ***
## Ease.of.Online.Booking2 ***
## Ease.of.Online.Booking3 ***
## Ease.of.Online.Booking4 ***
## Ease.of.Online.Booking5 ***
## Check.in.Service2 *
## Check.in.Service3 ***
## Check.in.Service4 ***
## Check.in.Service5 ***
## Online.Boarding1 ***
## Online.Boarding2 ***
## Online.Boarding3 ***
## Online.Boarding4 ***
## Online.Boarding5 ***
## Gate.Location2 **
## Gate.Location3
## Gate.Location4 **
## Gate.Location5 ***
## On.board.Service2 *
## On.board.Service3 ***
## On.board.Service4 ***
## On.board.Service5 ***
## Seat.Comfort2 **
## Seat.Comfort3 ***
## Seat.Comfort4 ***
## Seat.Comfort5
## Leg.Room.Service1 ***
## Leg.Room.Service2 ***
## Leg.Room.Service3 ***
## Leg.Room.Service4 ***
## Leg.Room.Service5 ***
## Cleanliness2 ***
## Cleanliness3 ***
## Cleanliness4 ***
## Cleanliness5 ***
## In.flight.Service2
## In.flight.Service3 ***
## In.flight.Service4 .
## In.flight.Service5 ***
## Baggage.Handling2
## Baggage.Handling3 ***
## Baggage.Handling4 **
## Baggage.Handling5 ***
## departure.delay.statusOn Time
## arrival.delay.statusOn Time **
## delay.statusOn-time *
## Flight.Distance.log ***
## Age.log ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 124461 on 90901 degrees of freedom
## Residual deviance: 44537 on 90842 degrees of freedom
## AIC: 44657
##
## Number of Fisher Scoring iterations: 6
AIC increase to 44749
pred.test <- predict(model_log_signif, newdata = testing, type = "response")
# Convert to class labels
pred.test.result <- ifelse(pred.test > 0.5, "Satisfied", "Neutral or Dissatisfied")
testing$Satisfaction <- factor(testing$Satisfaction, levels = c("Satisfied", "Neutral or Dissatisfied"))
pred.test.result <- factor(pred.test.result, levels = c("Satisfied", "Neutral or Dissatisfied"))
confusionMatrix(pred.test.result, testing$Satisfaction)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Satisfied Neutral or Dissatisfied
## Satisfied 14778 1700
## Neutral or Dissatisfied 2136 20344
##
## Accuracy : 0.9015
## 95% CI : (0.8985, 0.9045)
## No Information Rate : 0.5658
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.799
##
## Mcnemar's Test P-Value : 2.165e-12
##
## Sensitivity : 0.8737
## Specificity : 0.9229
## Pos Pred Value : 0.8968
## Neg Pred Value : 0.9050
## Prevalence : 0.4342
## Detection Rate : 0.3793
## Detection Prevalence : 0.4230
## Balanced Accuracy : 0.8983
##
## 'Positive' Class : Satisfied
##
The accuracy decrease
ANOVA test :
anova(model_log, model_log_signif, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: Satisfaction ~ Gender + Age + Customer.Type + Type.of.Travel +
## Class + Flight.Distance + Departure.Delay + Arrival.Delay +
## Departure.and.Arrival.Time.Convenience + Ease.of.Online.Booking +
## Check.in.Service + Online.Boarding + Gate.Location + On.board.Service +
## Seat.Comfort + Leg.Room.Service + Cleanliness + Food.and.Drink +
## In.flight.Service + In.flight.Wifi.Service + In.flight.Entertainment +
## Baggage.Handling + Age_Range + Distance_Group + departure.delay.status +
## arrival.delay.status + delay.status + In.flight.service +
## Flight.Entertainment + Pre.flight.service + Comfortability +
## Flight.Distance.log + Age.log + Flight.Distance.log.z + Age.log.z
## Model 2: Satisfaction ~ Gender + Age + Customer.Type + Type.of.Travel +
## Class + Departure.and.Arrival.Time.Convenience + Ease.of.Online.Booking +
## Check.in.Service + Online.Boarding + Gate.Location + On.board.Service +
## Seat.Comfort + Leg.Room.Service + Cleanliness + In.flight.Service +
## Baggage.Handling + departure.delay.status + arrival.delay.status +
## delay.status + Flight.Distance.log + Age.log
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 90776 29296
## 2 90842 44537 -66 -15241 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model full siginificantly better than model with significant only
Adjusting Threshold
coords(roc_obj, "best", ret = "threshold")
## threshold
## 1 0.4430813
pred.test <- predict(model_log, newdata = testing, type = "response")
# Convert to class labels
pred.test.result <- ifelse(pred.test > 0.5012364, "Satisfied", "Neutral or Dissatisfied")
testing$Satisfaction <- factor(testing$Satisfaction, levels = c("Satisfied", "Neutral or Dissatisfied"))
pred.test.result <- factor(pred.test.result, levels = c("Satisfied", "Neutral or Dissatisfied"))
confusionMatrix(pred.test.result, testing$Satisfaction)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Satisfied Neutral or Dissatisfied
## Satisfied 15531 1076
## Neutral or Dissatisfied 1383 20968
##
## Accuracy : 0.9369
## 95% CI : (0.9344, 0.9393)
## No Information Rate : 0.5658
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8713
##
## Mcnemar's Test P-Value : 6.794e-10
##
## Sensitivity : 0.9182
## Specificity : 0.9512
## Pos Pred Value : 0.9352
## Neg Pred Value : 0.9381
## Prevalence : 0.4342
## Detection Rate : 0.3987
## Detection Prevalence : 0.4263
## Balanced Accuracy : 0.9347
##
## 'Positive' Class : Satisfied
##
Performance not increase significantly
Dimensionality Reduction
training_pca <- training %>%
mutate(across(everything(), ~ if (all(!is.na(as.numeric(as.character(.x)))))
as.numeric(as.character(.x)) else .x)) %>%
select(where(is.numeric)) %>%
scale(center = TRUE, scale = TRUE)
## Warning: There were 10 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(everything(), ~if
## (all(!is.na(as.numeric(as.character(.x))))) as.numeric(as.character(.x)) else
## .x)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 9 remaining warnings.
pca_results <- prcomp(training_pca)
summary(pca_results)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.5297 1.7720 1.6975 1.6488 1.50325 1.2056 1.14900
## Proportion of Variance 0.2461 0.1208 0.1108 0.1046 0.08691 0.0559 0.05078
## Cumulative Proportion 0.2461 0.3669 0.4777 0.5823 0.66920 0.7251 0.77588
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.01453 0.95402 0.73911 0.73725 0.72627 0.71765 0.65246
## Proportion of Variance 0.03959 0.03501 0.02101 0.02091 0.02029 0.01981 0.01637
## Cumulative Proportion 0.81546 0.85047 0.87148 0.89239 0.91267 0.93248 0.94886
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.60691 0.5467 0.53336 0.46794 0.33579 0.21528 1.381e-14
## Proportion of Variance 0.01417 0.0115 0.01094 0.00842 0.00434 0.00178 0.000e+00
## Cumulative Proportion 0.96302 0.9745 0.98546 0.99388 0.99822 1.00000 1.000e+00
## PC22 PC23 PC24 PC25 PC26
## Standard deviation 8.31e-15 6.616e-15 5.473e-15 4.937e-15 2.821e-15
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 1.00e+00 1.000e+00 1.000e+00 1.000e+00 1.000e+00
12 var already achieve 91%
pca_data <- as.data.frame(pca_results$x[, 1:12])
pca_data$Satisfaction <- training$Satisfaction
model_log_pca <- glm(Satisfaction ~.,
data = pca_data, family = "binomial")
summary(model_log_pca)
##
## Call:
## glm(formula = Satisfaction ~ ., family = "binomial", data = pca_data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.467566 0.009116 -51.289 < 2e-16 ***
## PC1 -0.690108 0.004710 -146.522 < 2e-16 ***
## PC2 0.175815 0.005161 34.067 < 2e-16 ***
## PC3 -0.068314 0.005130 -13.317 < 2e-16 ***
## PC4 -0.148157 0.005356 -27.664 < 2e-16 ***
## PC5 -0.078789 0.005756 -13.689 < 2e-16 ***
## PC6 -0.193519 0.007452 -25.968 < 2e-16 ***
## PC7 0.424116 0.007707 55.028 < 2e-16 ***
## PC8 -0.515440 0.008978 -57.413 < 2e-16 ***
## PC9 0.042990 0.009063 4.743 2.10e-06 ***
## PC10 -0.090449 0.012062 -7.498 6.46e-14 ***
## PC11 0.207175 0.011909 17.396 < 2e-16 ***
## PC12 -0.204512 0.011948 -17.118 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 124461 on 90901 degrees of freedom
## Residual deviance: 80737 on 90889 degrees of freedom
## AIC: 80763
##
## Number of Fisher Scoring iterations: 5
AIC higher 80736
# 1. Convert test data to numeric and scale
test_pca_scaled <- testing %>%
mutate(across(everything(), ~ if (all(!is.na(as.numeric(as.character(.x)))))
as.numeric(as.character(.x)) else .x)) %>%
select(where(is.numeric)) %>%
scale(center = attr(training_pca, "scaled:center"),
scale = attr(training_pca, "scaled:scale")) # Use training scale
## Warning: There were 10 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(everything(), ~if
## (all(!is.na(as.numeric(as.character(.x))))) as.numeric(as.character(.x)) else
## .x)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 9 remaining warnings.
# 2. Apply PCA transformation
test_pca <- as.data.frame(predict(pca_results, newdata = test_pca_scaled)[, 1:12])
# 3. Add target variable
test_pca$Satisfaction <- testing$Satisfaction
# 4. Predict using the model trained on PCA components
pred.test <- predict(model_log_pca, newdata = test_pca, type = "response")
# Convert to class labels
pred.test.result <- ifelse(pred.test > 0.5, "Satisfied", "Neutral or Dissatisfied")
testing$Satisfaction <- factor(testing$Satisfaction, levels = c("Satisfied", "Neutral or Dissatisfied"))
pred.test.result <- factor(pred.test.result, levels = c("Satisfied", "Neutral or Dissatisfied"))
confusionMatrix(pred.test.result, testing$Satisfaction)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Satisfied Neutral or Dissatisfied
## Satisfied 12085 2567
## Neutral or Dissatisfied 4829 19477
##
## Accuracy : 0.8102
## 95% CI : (0.8062, 0.814)
## No Information Rate : 0.5658
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6075
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.7145
## Specificity : 0.8836
## Pos Pred Value : 0.8248
## Neg Pred Value : 0.8013
## Prevalence : 0.4342
## Detection Rate : 0.3102
## Detection Prevalence : 0.3761
## Balanced Accuracy : 0.7990
##
## 'Positive' Class : Satisfied
##
Accuracy decrease using PCA